2025-12-04T09:32:17.1571802Z Current runner version: '2.330.0' 2025-12-04T09:32:17.1579224Z Runner name: 'i-03bbda7791efb68ed' 2025-12-04T09:32:17.1580145Z Runner group name: 'default' 2025-12-04T09:32:17.1581201Z Machine name: 'ip-10-0-76-64' 2025-12-04T09:32:17.1584412Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T09:32:17.1586989Z Contents: read 2025-12-04T09:32:17.1587728Z Metadata: read 2025-12-04T09:32:17.1588348Z ##[endgroup] 2025-12-04T09:32:17.1590877Z Secret source: Actions 2025-12-04T09:32:17.1591837Z Prepare workflow directory 2025-12-04T09:32:17.2182965Z Prepare all required actions 2025-12-04T09:32:17.2231197Z Getting action download info 2025-12-04T09:32:17.5706473Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T09:32:19.8499724Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f) 2025-12-04T09:32:34.8048555Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T09:32:35.1546803Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T09:32:35.4135266Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T09:32:35.5943042Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:32:35.8374692Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T09:32:36.1592452Z Getting action download info 2025-12-04T09:32:36.2871116Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T09:32:36.5900090Z Getting action download info 2025-12-04T09:32:36.7171641Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T09:32:36.9576703Z Getting action download info 2025-12-04T09:32:37.0751538Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T09:32:37.2915476Z Getting action download info 2025-12-04T09:32:37.5288953Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T09:32:37.5293713Z ##[group] Inputs 2025-12-04T09:32:37.5294164Z build-environment: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:32:37.5301892Z test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:32:37.5310111Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:32:37.5311124Z sync-tag: 2025-12-04T09:32:37.5312091Z timeout-minutes: 240 2025-12-04T09:32:37.5312380Z use-gha: 2025-12-04T09:32:37.5312635Z dashboard-tag: 2025-12-04T09:32:37.5312918Z s3-bucket: gha-artifacts 2025-12-04T09:32:37.5313221Z aws-role-to-assume: 2025-12-04T09:32:37.5313882Z disable-monitor: false 2025-12-04T09:32:37.5314240Z monitor-log-interval: 5 2025-12-04T09:32:37.5314592Z monitor-data-collect-interval: 1 2025-12-04T09:32:37.5314980Z ##[endgroup] 2025-12-04T09:32:37.5315770Z Complete job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:32:37.5897954Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T09:32:37.6013766Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T09:32:37.6024261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:37.6025026Z ##[endgroup] 2025-12-04T09:32:39.1448433Z Runner Type: linux.g4dn.4xlarge.nvidia.gpu 2025-12-04T09:32:39.1449080Z Instance Type: g4dn.4xlarge 2025-12-04T09:32:39.1449393Z AMI Name: unknown 2025-12-04T09:32:39.1491392Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T09:32:45.3827271Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T09:32:45.3827792Z with: 2025-12-04T09:32:45.3828431Z github-secret: *** 2025-12-04T09:32:45.3829289Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T09:32:45.3830252Z activate-with-label: false 2025-12-04T09:32:45.3830572Z label: with-ssh 2025-12-04T09:32:45.3830860Z remove-existing-keys: true 2025-12-04T09:32:45.3831185Z fail-silently: true 2025-12-04T09:32:45.3831452Z env: 2025-12-04T09:32:45.3831700Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.3832016Z ##[endgroup] 2025-12-04T09:32:45.5466988Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T09:32:45.5468752Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T09:32:45.5846986Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T09:32:45.5847496Z with: 2025-12-04T09:32:45.5847750Z no-sudo: true 2025-12-04T09:32:45.5848027Z submodules: recursive 2025-12-04T09:32:45.5848328Z fetch-depth: 0 2025-12-04T09:32:45.5848614Z env: 2025-12-04T09:32:45.5848860Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5849153Z ##[endgroup] 2025-12-04T09:32:45.5934616Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:32:45.5935773Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:32:45.5946549Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:45.5947013Z env: 2025-12-04T09:32:45.5947283Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.5947629Z ##[endgroup] 2025-12-04T09:32:45.6037671Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T09:32:45.6038206Z # Use all available CPUs for fetching 2025-12-04T09:32:45.6038619Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:32:45.6039006Z git config --global fetch.parallel 0 2025-12-04T09:32:45.6039686Z git config --global submodule.fetchJobs 0 2025-12-04T09:32:45.6040090Z  2025-12-04T09:32:45.6040503Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T09:32:45.6041071Z # do it here as well just in case 2025-12-04T09:32:45.6041448Z if [[ -d .git ]]; then 2025-12-04T09:32:45.6041809Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:32:45.6042184Z  sudo git clean -ffdx 2025-12-04T09:32:45.6042613Z  else 2025-12-04T09:32:45.6042895Z  git clean -ffdx 2025-12-04T09:32:45.6043212Z  fi 2025-12-04T09:32:45.6043467Z fi 2025-12-04T09:32:45.6050120Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:45.6050570Z env: 2025-12-04T09:32:45.6050919Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.6051240Z NO_SUDO: true 2025-12-04T09:32:45.6051515Z ##[endgroup] 2025-12-04T09:32:45.6184267Z ##[group]Run actions/checkout@v4 2025-12-04T09:32:45.6184653Z with: 2025-12-04T09:32:45.6184944Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:32:45.6185328Z fetch-depth: 0 2025-12-04T09:32:45.6185606Z submodules: recursive 2025-12-04T09:32:45.6185909Z show-progress: false 2025-12-04T09:32:45.6186223Z repository: pytorch/pytorch 2025-12-04T09:32:45.6186692Z token: *** 2025-12-04T09:32:45.6186952Z ssh-strict: true 2025-12-04T09:32:45.6187229Z ssh-user: git 2025-12-04T09:32:45.6187501Z persist-credentials: true 2025-12-04T09:32:45.6187820Z clean: true 2025-12-04T09:32:45.6188122Z sparse-checkout-cone-mode: true 2025-12-04T09:32:45.6188457Z fetch-tags: false 2025-12-04T09:32:45.6188730Z lfs: false 2025-12-04T09:32:45.6189003Z set-safe-directory: true 2025-12-04T09:32:45.6189303Z env: 2025-12-04T09:32:45.6189551Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:45.6189849Z ##[endgroup] 2025-12-04T09:32:45.7433515Z Syncing repository: pytorch/pytorch 2025-12-04T09:32:45.7435128Z ##[group]Getting Git version info 2025-12-04T09:32:45.7435740Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:32:45.7436542Z [command]/usr/bin/git version 2025-12-04T09:32:45.7634283Z git version 2.50.1 2025-12-04T09:32:45.7663559Z ##[endgroup] 2025-12-04T09:32:45.7674888Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/dbedb87e-7286-4c3b-9e34-21fce791ca44/.gitconfig' 2025-12-04T09:32:45.7694585Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/dbedb87e-7286-4c3b-9e34-21fce791ca44' before making global git config changes 2025-12-04T09:32:45.7695805Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T09:32:45.7700257Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:32:45.7746617Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T09:32:45.7749932Z ##[group]Initializing the repository 2025-12-04T09:32:45.7754496Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:32:45.7817587Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T09:32:45.7818312Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T09:32:45.7818995Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T09:32:45.7819476Z hint: 2025-12-04T09:32:45.7819822Z hint: git config --global init.defaultBranch 2025-12-04T09:32:45.7820238Z hint: 2025-12-04T09:32:45.7820613Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T09:32:45.7821312Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T09:32:45.7821852Z hint: 2025-12-04T09:32:45.7822093Z hint: git branch -m 2025-12-04T09:32:45.7822405Z hint: 2025-12-04T09:32:45.7822836Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T09:32:45.7826976Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T09:32:45.7836956Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T09:32:45.7877034Z ##[endgroup] 2025-12-04T09:32:45.7877549Z ##[group]Disabling automatic garbage collection 2025-12-04T09:32:45.7881330Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T09:32:45.7910139Z ##[endgroup] 2025-12-04T09:32:45.7910703Z ##[group]Setting up auth 2025-12-04T09:32:45.7917074Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T09:32:45.7946009Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T09:32:45.8314116Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T09:32:45.8344416Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T09:32:45.8666767Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T09:32:45.8697670Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T09:32:45.9008780Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:32:45.9064590Z ##[endgroup] 2025-12-04T09:32:45.9065134Z ##[group]Fetching the repository 2025-12-04T09:32:45.9074250Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T09:33:41.3065034Z From https://github.com/pytorch/pytorch 2025-12-04T09:33:41.3065625Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T09:33:41.3066357Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T09:33:41.3067055Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T09:33:41.3067828Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T09:33:41.3068838Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T09:33:41.3070155Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T09:33:41.3072678Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T09:33:41.3074962Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T09:33:41.3076049Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T09:33:41.3077620Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T09:33:41.3078743Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T09:33:41.3080004Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T09:33:41.3081338Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T09:33:41.3083085Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T09:33:41.3084190Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T09:33:41.3085861Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T09:33:41.3087801Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T09:33:41.3089629Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T09:33:41.3090833Z * [new branch] adi/test -> origin/adi/test 2025-12-04T09:33:41.3092179Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T09:33:41.3093599Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T09:33:41.3094862Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T09:33:41.3096131Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T09:33:41.3097580Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T09:33:41.3098687Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T09:33:41.3100474Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T09:33:41.3103178Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T09:33:41.3104401Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T09:33:41.3106088Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T09:33:41.3107196Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T09:33:41.3109094Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T09:33:41.3110388Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T09:33:41.3111631Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T09:33:41.3112964Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T09:33:41.3114096Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T09:33:41.3115416Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T09:33:41.3116575Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T09:33:41.3118397Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T09:33:41.3120136Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T09:33:41.3121356Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T09:33:41.3122984Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T09:33:41.3124228Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T09:33:41.3125796Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T09:33:41.3126865Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T09:33:41.3128170Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T09:33:41.3129468Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T09:33:41.3130982Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T09:33:41.3132209Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T09:33:41.3133520Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T09:33:41.3134828Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T09:33:41.3136282Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T09:33:41.3137462Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T09:33:41.3138769Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T09:33:41.3140043Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T09:33:41.3141298Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T09:33:41.3143801Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T09:33:41.3144831Z * [new branch] async_tp -> origin/async_tp 2025-12-04T09:33:41.3146440Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T09:33:41.3147684Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T09:33:41.3149019Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T09:33:41.3150518Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T09:33:41.3151756Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T09:33:41.3153289Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T09:33:41.3154542Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T09:33:41.3156027Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T09:33:41.3157431Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T09:33:41.3158659Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T09:33:41.3159991Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T09:33:41.3161484Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T09:33:41.3163104Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T09:33:41.3164868Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T09:33:41.3165971Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T09:33:41.3167277Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T09:33:41.3168774Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T09:33:41.3170623Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T09:33:41.3172224Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T09:33:41.3173413Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T09:33:41.3174830Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T09:33:41.3176082Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T09:33:41.3177968Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T09:33:41.3179704Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T09:33:41.3181484Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T09:33:41.3182551Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T09:33:41.3183814Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T09:33:41.3185030Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T09:33:41.3186560Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T09:33:41.3187674Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T09:33:41.3188893Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T09:33:41.3190864Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T09:33:41.3192417Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T09:33:41.3193489Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T09:33:41.3194682Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T09:33:41.3196036Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T09:33:41.3197251Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T09:33:41.3198646Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T09:33:41.3199990Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T09:33:41.3201601Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T09:33:41.3203081Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T09:33:41.3204505Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T09:33:41.3205759Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T09:33:41.3207004Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T09:33:41.3208343Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T09:33:41.3209728Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T09:33:41.3210972Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T09:33:41.3212227Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T09:33:41.3213470Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T09:33:41.3214677Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T09:33:41.3215959Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T09:33:41.3217171Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T09:33:41.3218384Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T09:33:41.3219711Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T09:33:41.3221204Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T09:33:41.3222333Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T09:33:41.3223571Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T09:33:41.3224909Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T09:33:41.3226117Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T09:33:41.3227360Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T09:33:41.3228595Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T09:33:41.3230543Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T09:33:41.3231761Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T09:33:41.3233128Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T09:33:41.3234317Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T09:33:41.3235836Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T09:33:41.3236999Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T09:33:41.3238260Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T09:33:41.3240263Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T09:33:41.3241562Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T09:33:41.3243294Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3244534Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3245968Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3247369Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3248724Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3250240Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3251528Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3252887Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3254338Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3255699Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3257057Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3258371Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3259688Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3261054Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3262489Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3263771Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3265114Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3266481Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3267840Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T09:33:41.3269031Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T09:33:41.3270278Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T09:33:41.3271827Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T09:33:41.3273068Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T09:33:41.3274302Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T09:33:41.3275594Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T09:33:41.3276934Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T09:33:41.3278340Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T09:33:41.3280498Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T09:33:41.3281534Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T09:33:41.3283600Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T09:33:41.3285011Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T09:33:41.3286051Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T09:33:41.3287497Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T09:33:41.3288932Z * [new branch] context_test -> origin/context_test 2025-12-04T09:33:41.3290936Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T09:33:41.3292408Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T09:33:41.3293959Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T09:33:41.3295824Z * [new branch] crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering 2025-12-04T09:33:41.3297383Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T09:33:41.3298554Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T09:33:41.3299823Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T09:33:41.3301304Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T09:33:41.3303138Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T09:33:41.3304227Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T09:33:41.3305883Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T09:33:41.3307532Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T09:33:41.3309166Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T09:33:41.3310597Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T09:33:41.3311876Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T09:33:41.3313126Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T09:33:41.3314551Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T09:33:41.3315782Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T09:33:41.3317058Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T09:33:41.3318355Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T09:33:41.3319818Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T09:33:41.3321086Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T09:33:41.3322322Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T09:33:41.3323779Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T09:33:41.3325013Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T09:33:41.3326289Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T09:33:41.3327481Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T09:33:41.3328842Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T09:33:41.3330270Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T09:33:41.3331478Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T09:33:41.3332762Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T09:33:41.3334015Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T09:33:41.3335246Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T09:33:41.3336722Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T09:33:41.3338519Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T09:33:41.3339740Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T09:33:41.3341615Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T09:33:41.3343402Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T09:33:41.3344599Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T09:33:41.3346094Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T09:33:41.3350507Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T09:33:41.3352122Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T09:33:41.3353461Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T09:33:41.3354842Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T09:33:41.3356916Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T09:33:41.3359030Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T09:33:41.3360612Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T09:33:41.3362298Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T09:33:41.3363684Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T09:33:41.3364986Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T09:33:41.3366687Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T09:33:41.3368264Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T09:33:41.3370114Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T09:33:41.3371877Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T09:33:41.3373882Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T09:33:41.3375420Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T09:33:41.3377035Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T09:33:41.3378375Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T09:33:41.3379619Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T09:33:41.3381116Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T09:33:41.3382244Z * [new branch] docs -> origin/docs 2025-12-04T09:33:41.3383741Z * [new branch] documentation -> origin/documentation 2025-12-04T09:33:41.3384934Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T09:33:41.3386891Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T09:33:41.3388030Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T09:33:41.3389209Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T09:33:41.3390537Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T09:33:41.3392038Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T09:33:41.3393433Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T09:33:41.3394656Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T09:33:41.3396101Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T09:33:41.3397253Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T09:33:41.3399219Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T09:33:41.3400726Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T09:33:41.3402082Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T09:33:41.3403558Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T09:33:41.3404936Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T09:33:41.3406522Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T09:33:41.3408331Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T09:33:41.3409378Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T09:33:41.3410997Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T09:33:41.3412153Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T09:33:41.3413415Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T09:33:41.3415674Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T09:33:41.3416652Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T09:33:41.3417742Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T09:33:41.3419271Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T09:33:41.3420583Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T09:33:41.3421936Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T09:33:41.3423228Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T09:33:41.3424655Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T09:33:41.3426015Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T09:33:41.3427169Z * [new branch] exec -> origin/exec 2025-12-04T09:33:41.3428834Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T09:33:41.3430118Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T09:33:41.3431540Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T09:33:41.3433001Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T09:33:41.3434212Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T09:33:41.3435505Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T09:33:41.3436788Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T09:33:41.3438300Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T09:33:41.3439520Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T09:33:41.3440776Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T09:33:41.3442056Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T09:33:41.3443866Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T09:33:41.3445195Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T09:33:41.3446460Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T09:33:41.3447855Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T09:33:41.3449075Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T09:33:41.3450380Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T09:33:41.3451805Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T09:33:41.3453018Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T09:33:41.3454281Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T09:33:41.3456313Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T09:33:41.3457658Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T09:33:41.3458888Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T09:33:41.3460161Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T09:33:41.3461621Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T09:33:41.3462830Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T09:33:41.3464391Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T09:33:41.3465873Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T09:33:41.3467126Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T09:33:41.3468583Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T09:33:41.3469998Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T09:33:41.3471538Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T09:33:41.3472745Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T09:33:41.3474011Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T09:33:41.3475245Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T09:33:41.3477252Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T09:33:41.3478293Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T09:33:41.3480235Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T09:33:41.3481461Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T09:33:41.3483587Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T09:33:41.3485011Z * [new branch] fca -> origin/fca 2025-12-04T09:33:41.3486229Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T09:33:41.3487676Z * [new branch] fca5 -> origin/fca5 2025-12-04T09:33:41.3490015Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T09:33:41.3491284Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T09:33:41.3493139Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T09:33:41.3494318Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T09:33:41.3496241Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T09:33:41.3497473Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T09:33:41.3498777Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T09:33:41.3499980Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T09:33:41.3501369Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T09:33:41.3502868Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T09:33:41.3504014Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T09:33:41.3505158Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T09:33:41.3506675Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T09:33:41.3508052Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T09:33:41.3509170Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T09:33:41.3510635Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T09:33:41.3512427Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T09:33:41.3513658Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T09:33:41.3514879Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T09:33:41.3516108Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T09:33:41.3517366Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T09:33:41.3518801Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T09:33:41.3520163Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T09:33:41.3521374Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T09:33:41.3522844Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T09:33:41.3524481Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T09:33:41.3525911Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T09:33:41.3527088Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T09:33:41.3529105Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T09:33:41.3530360Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T09:33:41.3531620Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T09:33:41.3533011Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T09:33:41.3534367Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T09:33:41.3536183Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T09:33:41.3537646Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T09:33:41.3539812Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T09:33:41.3541528Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T09:33:41.3544319Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T09:33:41.3545551Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T09:33:41.3547899Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T09:33:41.3549097Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T09:33:41.3551747Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T09:33:41.3553004Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T09:33:41.3555341Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T09:33:41.3556589Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T09:33:41.3557946Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T09:33:41.3559727Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T09:33:41.3560941Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T09:33:41.3562394Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T09:33:41.3585649Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T09:33:41.3586331Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T09:33:41.3586967Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T09:33:41.3587591Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T09:33:41.3588324Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T09:33:41.3588951Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T09:33:41.3589570Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T09:33:41.3590175Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T09:33:41.3590791Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T09:33:41.3591418Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T09:33:41.3592037Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T09:33:41.3592642Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T09:33:41.3593386Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T09:33:41.3594103Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T09:33:41.3594801Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T09:33:41.3595515Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T09:33:41.3596222Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T09:33:41.3596944Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T09:33:41.3597639Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T09:33:41.3598351Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T09:33:41.3599064Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T09:33:41.3599771Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T09:33:41.3600466Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T09:33:41.3601395Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T09:33:41.3602104Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T09:33:41.3602887Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T09:33:41.3603585Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T09:33:41.3605346Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T09:33:41.3606506Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T09:33:41.3607813Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T09:33:41.3609717Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T09:33:41.3610868Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T09:33:41.3612165Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T09:33:41.3614001Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T09:33:41.3615317Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T09:33:41.3616442Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T09:33:41.3618336Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T09:33:41.3619571Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T09:33:41.3620834Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T09:33:41.3622580Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T09:33:41.3623776Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T09:33:41.3625055Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T09:33:41.3627154Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T09:33:41.3628404Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T09:33:41.3629712Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T09:33:41.3631542Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T09:33:41.3632878Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T09:33:41.3634138Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T09:33:41.3635977Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T09:33:41.3637164Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T09:33:41.3638479Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T09:33:41.3640385Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T09:33:41.3641651Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T09:33:41.3643083Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T09:33:41.3644957Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T09:33:41.3646308Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T09:33:41.3647671Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T09:33:41.3649642Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T09:33:41.3650924Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T09:33:41.3652171Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T09:33:41.3654364Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T09:33:41.3655927Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T09:33:41.3657194Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T09:33:41.3659186Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T09:33:41.3660554Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T09:33:41.3661868Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T09:33:41.3663792Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T09:33:41.3664922Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T09:33:41.3666302Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T09:33:41.3668104Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T09:33:41.3669317Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T09:33:41.3670584Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T09:33:41.3672650Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T09:33:41.3673912Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T09:33:41.3675237Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T09:33:41.3677341Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T09:33:41.3678555Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T09:33:41.3679830Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T09:33:41.3681920Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T09:33:41.3683368Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T09:33:41.3684727Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T09:33:41.3686605Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T09:33:41.3687863Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T09:33:41.3689162Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T09:33:41.3691386Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T09:33:41.3692764Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T09:33:41.3694395Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T09:33:41.3695566Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T09:33:41.3697736Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T09:33:41.3699210Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T09:33:41.3701123Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T09:33:41.3702496Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T09:33:41.3703816Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T09:33:41.3705948Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T09:33:41.3707189Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T09:33:41.3708573Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T09:33:41.3710315Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T09:33:41.3711522Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T09:33:41.3712957Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T09:33:41.3714642Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T09:33:41.3715855Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T09:33:41.3717140Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T09:33:41.3718904Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T09:33:41.3720266Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T09:33:41.3721391Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T09:33:41.3723288Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T09:33:41.3724450Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T09:33:41.3725805Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T09:33:41.3727538Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T09:33:41.3728686Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T09:33:41.3729919Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T09:33:41.3731685Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T09:33:41.3732892Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T09:33:41.3734272Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T09:33:41.3735943Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T09:33:41.3737390Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T09:33:41.3738674Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T09:33:41.3740481Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T09:33:41.3742132Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T09:33:41.3743175Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T09:33:41.3744594Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T09:33:41.3746294Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T09:33:41.3747316Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T09:33:41.3748583Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T09:33:41.3750447Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T09:33:41.3751481Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T09:33:41.3752862Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T09:33:41.3754516Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T09:33:41.3755605Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T09:33:41.3757402Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T09:33:41.3759496Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T09:33:41.3760800Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T09:33:41.3762123Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T09:33:41.3764130Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T09:33:41.3765412Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T09:33:41.3766696Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T09:33:41.3768871Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T09:33:41.3772108Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T09:33:41.3773494Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T09:33:41.3774170Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T09:33:41.3774865Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T09:33:41.3775569Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T09:33:41.3777460Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T09:33:41.3778634Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T09:33:41.3779880Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T09:33:41.3781718Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T09:33:41.3782952Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T09:33:41.3784785Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T09:33:41.3786021Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T09:33:41.3787290Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T09:33:41.3788981Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T09:33:41.3790158Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T09:33:41.3792045Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T09:33:41.3793203Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T09:33:41.3794396Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T09:33:41.3796207Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T09:33:41.3797620Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T09:33:41.3799165Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T09:33:41.3801160Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T09:33:41.3802802Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T09:33:41.3803997Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T09:33:41.3805645Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T09:33:41.3806867Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T09:33:41.3808123Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T09:33:41.3810227Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T09:33:41.3811447Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T09:33:41.3813824Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T09:33:41.3815129Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T09:33:41.3817011Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T09:33:41.3818246Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T09:33:41.3819630Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T09:33:41.3821256Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T09:33:41.3822493Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T09:33:41.3823909Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T09:33:41.3825282Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T09:33:41.3826562Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T09:33:41.3827824Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T09:33:41.3829699Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T09:33:41.3830965Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T09:33:41.3832320Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T09:33:41.3834014Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T09:33:41.3835228Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T09:33:41.3836469Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T09:33:41.3838431Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T09:33:41.3839669Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T09:33:41.3840984Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T09:33:41.3842682Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T09:33:41.3844018Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T09:33:41.3845371Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T09:33:41.3847036Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T09:33:41.3848247Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T09:33:41.3849930Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T09:33:41.3851225Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T09:33:41.3852355Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T09:33:41.3854430Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T09:33:41.3855696Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T09:33:41.3856909Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T09:33:41.3858517Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T09:33:41.3859694Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T09:33:41.3861381Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T09:33:41.3862513Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T09:33:41.3864261Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T09:33:41.3865414Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T09:33:41.3868044Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T09:33:41.3869606Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T09:33:41.3871174Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T09:33:41.3872995Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T09:33:41.3875140Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T09:33:41.3876243Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T09:33:41.3878119Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T09:33:41.3879334Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T09:33:41.3880989Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T09:33:41.3882255Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T09:33:41.3884127Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T09:33:41.3885267Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T09:33:41.3886574Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T09:33:41.3888896Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T09:33:41.3890064Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T09:33:41.3891381Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T09:33:41.3893162Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T09:33:41.3894623Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T09:33:41.3895828Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T09:33:41.3897802Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T09:33:41.3898944Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T09:33:41.3900370Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T09:33:41.3902400Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T09:33:41.3903659Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T09:33:41.3904920Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T09:33:41.3906544Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T09:33:41.3907737Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T09:33:41.3909170Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T09:33:41.3911048Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T09:33:41.3912205Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T09:33:41.3913676Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T09:33:41.3915298Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T09:33:41.3916567Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T09:33:41.3917884Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T09:33:41.3919646Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T09:33:41.3920892Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T09:33:41.3922257Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T09:33:41.3924170Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T09:33:41.3925387Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T09:33:41.3926813Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T09:33:41.3928891Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T09:33:41.3930175Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T09:33:41.3931440Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T09:33:41.3933279Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T09:33:41.3934496Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T09:33:41.3935973Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T09:33:41.3937575Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T09:33:41.3939107Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T09:33:41.3940425Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T09:33:41.3942240Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T09:33:41.3943480Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T09:33:41.3944734Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T09:33:41.3946602Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T09:33:41.3947867Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T09:33:41.3949204Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T09:33:41.3950882Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T09:33:41.3952074Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T09:33:41.3953377Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T09:33:41.3955208Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T09:33:41.3956413Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T09:33:41.3957669Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T09:33:41.3959440Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T09:33:41.3960639Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T09:33:41.3961898Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T09:33:41.3963868Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T09:33:41.3965155Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T09:33:41.3966473Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T09:33:41.3968320Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T09:33:41.3969493Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T09:33:41.3970775Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T09:33:41.3972556Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T09:33:41.3973806Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T09:33:41.3975039Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T09:33:41.3976974Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T09:33:41.3978087Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T09:33:41.3979358Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T09:33:41.3981211Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T09:33:41.3982398Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T09:33:41.3984661Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T09:33:41.3985888Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T09:33:41.3987396Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T09:33:41.3989031Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T09:33:41.3990233Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T09:33:41.3991517Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T09:33:41.3993292Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T09:33:41.3994494Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T09:33:41.3995735Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T09:33:41.3997543Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T09:33:41.3998726Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T09:33:41.4000062Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T09:33:41.4005124Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T09:33:41.4006362Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T09:33:41.4007675Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T09:33:41.4009498Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T09:33:41.4010746Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T09:33:41.4011988Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T09:33:41.4013849Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T09:33:41.4015082Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T09:33:41.4016340Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T09:33:41.4018105Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T09:33:41.4019323Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T09:33:41.4020632Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T09:33:41.4022577Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T09:33:41.4023761Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T09:33:41.4025042Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T09:33:41.4027168Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T09:33:41.4028397Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T09:33:41.4029696Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T09:33:41.4031667Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T09:33:41.4032833Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T09:33:41.4034549Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T09:33:41.4035688Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T09:33:41.4037631Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T09:33:41.4038854Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T09:33:41.4040635Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T09:33:41.4041826Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T09:33:41.4043850Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T09:33:41.4045071Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T09:33:41.4046807Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T09:33:41.4047952Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T09:33:41.4049739Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T09:33:41.4050807Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T09:33:41.4052103Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T09:33:41.4054445Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T09:33:41.4055710Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T09:33:41.4057430Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T09:33:41.4058658Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T09:33:41.4060528Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T09:33:41.4061656Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T09:33:41.4062950Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T09:33:41.4065007Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T09:33:41.4066180Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T09:33:41.4067561Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T09:33:41.4069782Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T09:33:41.4071611Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T09:33:41.4072840Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T09:33:41.4074166Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T09:33:41.4075989Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T09:33:41.4077150Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T09:33:41.4078501Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T09:33:41.4080305Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T09:33:41.4081533Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T09:33:41.4083027Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T09:33:41.4085278Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T09:33:41.4086449Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T09:33:41.4087746Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T09:33:41.4090109Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T09:33:41.4091409Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T09:33:41.4092864Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T09:33:41.4094779Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T09:33:41.4096096Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T09:33:41.4097567Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T09:33:41.4099748Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T09:33:41.4101499Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T09:33:41.4103513Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T09:33:41.4105010Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T09:33:41.4106814Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T09:33:41.4108095Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T09:33:41.4109417Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T09:33:41.4111470Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T09:33:41.4112651Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T09:33:41.4113913Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T09:33:41.4115838Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T09:33:41.4117147Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T09:33:41.4118452Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T09:33:41.4120693Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T09:33:41.4121854Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T09:33:41.4123808Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T09:33:41.4125119Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T09:33:41.4126392Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T09:33:41.4128128Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T09:33:41.4129306Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T09:33:41.4130728Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T09:33:41.4132568Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T09:33:41.4133703Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T09:33:41.4134963Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T09:33:41.4137042Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T09:33:41.4138322Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T09:33:41.4139484Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T09:33:41.4141355Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T09:33:41.4142537Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T09:33:41.4143807Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T09:33:41.4145696Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T09:33:41.4146877Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T09:33:41.4148152Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T09:33:41.4150315Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T09:33:41.4151784Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T09:33:41.4153217Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T09:33:41.4154959Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T09:33:41.4156214Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T09:33:41.4157481Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T09:33:41.4159613Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T09:33:41.4161041Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T09:33:41.4162329Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T09:33:41.4164474Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T09:33:41.4165741Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T09:33:41.4167036Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T09:33:41.4168777Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T09:33:41.4170028Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T09:33:41.4171287Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T09:33:41.4173184Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T09:33:41.4174316Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T09:33:41.4175870Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T09:33:41.4177499Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T09:33:41.4178636Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T09:33:41.4180120Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T09:33:41.4181828Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T09:33:41.4183094Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T09:33:41.4184364Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T09:33:41.4186200Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T09:33:41.4187522Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T09:33:41.4188821Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T09:33:41.4191653Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T09:33:41.4192502Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T09:33:41.4193764Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T09:33:41.4195615Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T09:33:41.4196831Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T09:33:41.4198125Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T09:33:41.4199892Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T09:33:41.4201236Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T09:33:41.4202852Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T09:33:41.4204658Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T09:33:41.4206099Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T09:33:41.4207300Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T09:33:41.4209778Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T09:33:41.4210970Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T09:33:41.4212216Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T09:33:41.4214160Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T09:33:41.4215414Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T09:33:41.4217076Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T09:33:41.4218537Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T09:33:41.4220105Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T09:33:41.4221289Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T09:33:41.4223212Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T09:33:41.4224413Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T09:33:41.4225672Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T09:33:41.4227669Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T09:33:41.4228799Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T09:33:41.4230116Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T09:33:41.4232022Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T09:33:41.4233158Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T09:33:41.4234431Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T09:33:41.4236267Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T09:33:41.4237449Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T09:33:41.4238780Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T09:33:41.4240646Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T09:33:41.4241877Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T09:33:41.4243486Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T09:33:41.4245219Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T09:33:41.4246435Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T09:33:41.4247757Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T09:33:41.4249666Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T09:33:41.4250940Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T09:33:41.4252236Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T09:33:41.4254040Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T09:33:41.4255387Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T09:33:41.4256646Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T09:33:41.4258485Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T09:33:41.4259841Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T09:33:41.4261161Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T09:33:41.4263048Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T09:33:41.4264228Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T09:33:41.4265492Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T09:33:41.4267322Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T09:33:41.4268610Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T09:33:41.4269847Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T09:33:41.4271666Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T09:33:41.4272940Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T09:33:41.4274400Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T09:33:41.4276136Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T09:33:41.4277302Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T09:33:41.4278637Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T09:33:41.4281138Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T09:33:41.4282455Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T09:33:41.4284666Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T09:33:41.4286507Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T09:33:41.4287804Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T09:33:41.4289091Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T09:33:41.4291004Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T09:33:41.4292204Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T09:33:41.4293465Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T09:33:41.4295468Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T09:33:41.4296496Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T09:33:41.4297789Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T09:33:41.4299984Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T09:33:41.4301147Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T09:33:41.4302471Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T09:33:41.4304298Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T09:33:41.4305484Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T09:33:41.4306762Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T09:33:41.4308649Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T09:33:41.4309879Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T09:33:41.4311430Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T09:33:41.4313241Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T09:33:41.4314457Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T09:33:41.4315796Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T09:33:41.4317733Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T09:33:41.4318981Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T09:33:41.4320260Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T09:33:41.4322093Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T09:33:41.4323407Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T09:33:41.4324660Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T09:33:41.4326545Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T09:33:41.4327835Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T09:33:41.4329126Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T09:33:41.4331019Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T09:33:41.4332332Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T09:33:41.4333580Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T09:33:41.4335606Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T09:33:41.4337142Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T09:33:41.4338101Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T09:33:41.4340068Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T09:33:41.4341321Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T09:33:41.4342611Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T09:33:41.4344421Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T09:33:41.4345849Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T09:33:41.4347047Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T09:33:41.4348888Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T09:33:41.4350098Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T09:33:41.4351401Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T09:33:41.4353429Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T09:33:41.4354729Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T09:33:41.4356000Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T09:33:41.4357943Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T09:33:41.4359190Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T09:33:41.4360505Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T09:33:41.4362413Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T09:33:41.4363651Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T09:33:41.4364956Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T09:33:41.4367257Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T09:33:41.4368743Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T09:33:41.4370021Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T09:33:41.4371905Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T09:33:41.4373192Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T09:33:41.4374493Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T09:33:41.4376322Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T09:33:41.4377530Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T09:33:41.4379048Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T09:33:41.4381137Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T09:33:41.4382487Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T09:33:41.4383645Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T09:33:41.4385521Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T09:33:41.4386752Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T09:33:41.4388284Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T09:33:41.4389991Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T09:33:41.4391282Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T09:33:41.4392581Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T09:33:41.4394358Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T09:33:41.4395631Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T09:33:41.4397054Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T09:33:41.4398852Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T09:33:41.4400217Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T09:33:41.4401730Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T09:33:41.4404109Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T09:33:41.4405295Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T09:33:41.4406593Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T09:33:41.4409131Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T09:33:41.4410346Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T09:33:41.4412109Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T09:33:41.4413687Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T09:33:41.4414782Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T09:33:41.4416030Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T09:33:41.4417661Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T09:33:41.4418777Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T09:33:41.4420404Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T09:33:41.4421617Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T09:33:41.4423661Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T09:33:41.4424900Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T09:33:41.4426923Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T09:33:41.4428109Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T09:33:41.4429763Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T09:33:41.4431020Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T09:33:41.4432272Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T09:33:41.4433928Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T09:33:41.4435111Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T09:33:41.4436381Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T09:33:41.4438432Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T09:33:41.4440107Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T09:33:41.4441252Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T09:33:41.4443586Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T09:33:41.4444721Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T09:33:41.4446149Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T09:33:41.4447857Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T09:33:41.4449109Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T09:33:41.4450960Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T09:33:41.4452607Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T09:33:41.4453880Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T09:33:41.4455151Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T09:33:41.4457004Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T09:33:41.4458268Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T09:33:41.4459572Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T09:33:41.4461485Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T09:33:41.4462711Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T09:33:41.4464000Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T09:33:41.4466354Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T09:33:41.4467575Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T09:33:41.4469592Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T09:33:41.4470989Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T09:33:41.4472265Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T09:33:41.4474224Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T09:33:41.4475494Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T09:33:41.4476812Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T09:33:41.4478723Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T09:33:41.4479916Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T09:33:41.4482136Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T09:33:41.4483451Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T09:33:41.4485036Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T09:33:41.4486883Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T09:33:41.4488192Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T09:33:41.4489498Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T09:33:41.4491488Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T09:33:41.4492836Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T09:33:41.4494281Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T09:33:41.4496137Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T09:33:41.4497372Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T09:33:41.4498848Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T09:33:41.4500584Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T09:33:41.4505043Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T09:33:41.4506218Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T09:33:41.4508237Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T09:33:41.4509340Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T09:33:41.4510803Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T09:33:41.4512373Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T09:33:41.4513593Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T09:33:41.4514906Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T09:33:41.4516805Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T09:33:41.4517944Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T09:33:41.4519426Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T09:33:41.4521056Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T09:33:41.4522269Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T09:33:41.4523666Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T09:33:41.4525352Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T09:33:41.4527063Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T09:33:41.4527927Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T09:33:41.4529561Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T09:33:41.4530786Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T09:33:41.4531995Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T09:33:41.4533811Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T09:33:41.4534823Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T09:33:41.4536081Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T09:33:41.4538228Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T09:33:41.4539470Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T09:33:41.4540753Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T09:33:41.4542426Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T09:33:41.4543655Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T09:33:41.4544820Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T09:33:41.4546533Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T09:33:41.4547761Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T09:33:41.4548918Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T09:33:41.4551102Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T09:33:41.4552453Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T09:33:41.4554090Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T09:33:41.4555297Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T09:33:41.4556494Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T09:33:41.4559104Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T09:33:41.4560309Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T09:33:41.4561565Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T09:33:41.4563503Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T09:33:41.4564718Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T09:33:41.4566036Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T09:33:41.4568040Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T09:33:41.4569247Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T09:33:41.4570711Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T09:33:41.4572645Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T09:33:41.4573935Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T09:33:41.4575224Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T09:33:41.4577143Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T09:33:41.4578269Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T09:33:41.4579532Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T09:33:41.4581624Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T09:33:41.4583086Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T09:33:41.4584271Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T09:33:41.4586132Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T09:33:41.4587633Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T09:33:41.4588773Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T09:33:41.4590619Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T09:33:41.4592327Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T09:33:41.4593668Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T09:33:41.4595550Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T09:33:41.4596952Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T09:33:41.4598283Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T09:33:41.4600166Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T09:33:41.4601763Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T09:33:41.4603237Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T09:33:41.4605004Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T09:33:41.4606506Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T09:33:41.4607777Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T09:33:41.4609339Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T09:33:41.4610723Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T09:33:41.4612133Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T09:33:41.4614225Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T09:33:41.4615535Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T09:33:41.4616833Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T09:33:41.4618612Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T09:33:41.4619918Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T09:33:41.4621188Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T09:33:41.4623077Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T09:33:41.4624365Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T09:33:41.4625679Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T09:33:41.4627369Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T09:33:41.4628663Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T09:33:41.4629969Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T09:33:41.4631682Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T09:33:41.4632958Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T09:33:41.4634239Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T09:33:41.4635940Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T09:33:41.4637202Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T09:33:41.4638551Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T09:33:41.4640311Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T09:33:41.4641574Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T09:33:41.4642981Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T09:33:41.4644979Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T09:33:41.4646317Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T09:33:41.4647599Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T09:33:41.4649187Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T09:33:41.4650555Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T09:33:41.4651798Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T09:33:41.4653495Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T09:33:41.4654790Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T09:33:41.4656067Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T09:33:41.4657902Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T09:33:41.4659189Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T09:33:41.4660459Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T09:33:41.4662304Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T09:33:41.4663627Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T09:33:41.4664950Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T09:33:41.4666628Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T09:33:41.4667858Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T09:33:41.4669104Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T09:33:41.4670885Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T09:33:41.4672058Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T09:33:41.4673410Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T09:33:41.4675233Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T09:33:41.4676795Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T09:33:41.4678041Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T09:33:41.4679817Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T09:33:41.4681102Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T09:33:41.4683014Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T09:33:41.4684581Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T09:33:41.4685868Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T09:33:41.4687203Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T09:33:41.4688751Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T09:33:41.4690037Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T09:33:41.4691308Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T09:33:41.4693097Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T09:33:41.4694441Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T09:33:41.4695671Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T09:33:41.4697417Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T09:33:41.4698921Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T09:33:41.4700430Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T09:33:41.4702333Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T09:33:41.4703928Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T09:33:41.4705652Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T09:33:41.4707573Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T09:33:41.4711527Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T09:33:41.4711797Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T09:33:41.4712452Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T09:33:41.4714525Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T09:33:41.4715184Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T09:33:41.4717546Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T09:33:41.4718873Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T09:33:41.4720166Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T09:33:41.4721813Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T09:33:41.4723351Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T09:33:41.4724667Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T09:33:41.4726310Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T09:33:41.4727580Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T09:33:41.4728902Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T09:33:41.4731512Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T09:33:41.4733156Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T09:33:41.4734925Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T09:33:41.4737458Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T09:33:41.4738773Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T09:33:41.4740070Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T09:33:41.4741698Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T09:33:41.4742928Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T09:33:41.4744305Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T09:33:41.4746131Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T09:33:41.4747475Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T09:33:41.4748830Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T09:33:41.4750543Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T09:33:41.4751850Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T09:33:41.4753128Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T09:33:41.4755252Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T09:33:41.4756582Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T09:33:41.4758405Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T09:33:41.4759664Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T09:33:41.4760930Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T09:33:41.4762531Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T09:33:41.4763919Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T09:33:41.4765290Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T09:33:41.4767090Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T09:33:41.4768433Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T09:33:41.4769633Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T09:33:41.4771262Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T09:33:41.4772536Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T09:33:41.4773841Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T09:33:41.4775467Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T09:33:41.4776728Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T09:33:41.4777988Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T09:33:41.4780057Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T09:33:41.4781415Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T09:33:41.4782801Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T09:33:41.4785021Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T09:33:41.4786462Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T09:33:41.4788518Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T09:33:41.4789741Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T09:33:41.4791148Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T09:33:41.4792996Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T09:33:41.4794641Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T09:33:41.4796116Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T09:33:41.4797927Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T09:33:41.4799293Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T09:33:41.4800740Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T09:33:41.4804817Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T09:33:41.4806170Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T09:33:41.4807524Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T09:33:41.4809143Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T09:33:41.4810490Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T09:33:41.4812148Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T09:33:41.4813499Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T09:33:41.4814850Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T09:33:41.4816096Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T09:33:41.4818018Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T09:33:41.4819427Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T09:33:41.4820864Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T09:33:41.4822779Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T09:33:41.4824272Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T09:33:41.4825517Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T09:33:41.4827211Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T09:33:41.4828567Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T09:33:41.4829922Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T09:33:41.4831697Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T09:33:41.4832926Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T09:33:41.4834151Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T09:33:41.4836108Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T09:33:41.4837321Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T09:33:41.4838676Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T09:33:41.4841052Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T09:33:41.4842490Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T09:33:41.4844268Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T09:33:41.4845648Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T09:33:41.4846980Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T09:33:41.4848311Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T09:33:41.4850084Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T09:33:41.4851393Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T09:33:41.4852693Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T09:33:41.4854803Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T09:33:41.4856277Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T09:33:41.4857887Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T09:33:41.4859078Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T09:33:41.4860594Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T09:33:41.4861872Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T09:33:41.4863373Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T09:33:41.4865155Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T09:33:41.4867179Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T09:33:41.4868452Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T09:33:41.4870139Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T09:33:41.4871502Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T09:33:41.4872777Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T09:33:41.4874423Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T09:33:41.4875707Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T09:33:41.4877030Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T09:33:41.4878717Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T09:33:41.4879995Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T09:33:41.4881240Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T09:33:41.4883064Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T09:33:41.4884331Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T09:33:41.4886569Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T09:33:41.4887840Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T09:33:41.4889142Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T09:33:41.4891004Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T09:33:41.4892331Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T09:33:41.4893654Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T09:33:41.4895693Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T09:33:41.4896973Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T09:33:41.4898290Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T09:33:41.4899965Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T09:33:41.4901341Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T09:33:41.4903095Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T09:33:41.4904909Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T09:33:41.4906156Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T09:33:41.4907505Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T09:33:41.4909215Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T09:33:41.4910445Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T09:33:41.4911818Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T09:33:41.4913442Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T09:33:41.4914708Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T09:33:41.4915994Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T09:33:41.4917943Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T09:33:41.4919519Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T09:33:41.4920908Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T09:33:41.4922759Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T09:33:41.4924183Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T09:33:41.4925556Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T09:33:41.4927290Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T09:33:41.4928696Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T09:33:41.4929883Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T09:33:41.4932099Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T09:33:41.4933496Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T09:33:41.4934836Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T09:33:41.4936699Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T09:33:41.4938150Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T09:33:41.4939456Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T09:33:41.4941341Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T09:33:41.4942923Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T09:33:41.4944196Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T09:33:41.4945761Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T09:33:41.4947151Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T09:33:41.4948357Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T09:33:41.4949990Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T09:33:41.4951338Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T09:33:41.4952535Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T09:33:41.4954602Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T09:33:41.4955993Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T09:33:41.4958201Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T09:33:41.4959419Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T09:33:41.4960708Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T09:33:41.4962417Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T09:33:41.4963862Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T09:33:41.4965431Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T09:33:41.4966617Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T09:33:41.4968154Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T09:33:41.4969291Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T09:33:41.4971051Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T09:33:41.4972340Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T09:33:41.4974123Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T09:33:41.4975447Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T09:33:41.4976701Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T09:33:41.4978407Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T09:33:41.4979679Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T09:33:41.4981580Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T09:33:41.5024501Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T09:33:41.5025052Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T09:33:41.5025424Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T09:33:41.5025736Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T09:33:41.5025983Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T09:33:41.5026243Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T09:33:41.5026486Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T09:33:41.5026745Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T09:33:41.5026996Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T09:33:41.5027240Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T09:33:41.5027498Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T09:33:41.5027910Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T09:33:41.5028171Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T09:33:41.5028414Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T09:33:41.5028655Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T09:33:41.5028914Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T09:33:41.5029155Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T09:33:41.5029403Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T09:33:41.5029662Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T09:33:41.5029903Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T09:33:41.5030170Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T09:33:41.5030415Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T09:33:41.5030658Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T09:33:41.5030920Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T09:33:41.5031166Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T09:33:41.5031421Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T09:33:41.5031668Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T09:33:41.5031910Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T09:33:41.5032170Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T09:33:41.5032425Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T09:33:41.5032686Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T09:33:41.5032932Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T09:33:41.5033175Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T09:33:41.5033435Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T09:33:41.5033678Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T09:33:41.5034019Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T09:33:41.5035704Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T09:33:41.5036888Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T09:33:41.5038167Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T09:33:41.5040288Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T09:33:41.5041632Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T09:33:41.5044036Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T09:33:41.5045383Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T09:33:41.5047344Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T09:33:41.5048737Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T09:33:41.5050058Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T09:33:41.5051950Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T09:33:41.5053280Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T09:33:41.5054558Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T09:33:41.5056236Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T09:33:41.5057534Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T09:33:41.5058821Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T09:33:41.5060708Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T09:33:41.5061990Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T09:33:41.5063178Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T09:33:41.5065106Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T09:33:41.5066513Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T09:33:41.5067878Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T09:33:41.5069587Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T09:33:41.5070887Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T09:33:41.5072198Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T09:33:41.5073710Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T09:33:41.5074901Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T09:33:41.5076576Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T09:33:41.5077749Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T09:33:41.5079889Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T09:33:41.5081265Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T09:33:41.5082625Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T09:33:41.5084744Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T09:33:41.5086144Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T09:33:41.5087373Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T09:33:41.5089108Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T09:33:41.5090408Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T09:33:41.5091700Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T09:33:41.5093476Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T09:33:41.5094746Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T09:33:41.5095990Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T09:33:41.5097746Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T09:33:41.5098991Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T09:33:41.5100319Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T09:33:41.5102215Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T09:33:41.5103551Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T09:33:41.5104886Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T09:33:41.5106628Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T09:33:41.5108683Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T09:33:41.5109723Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T09:33:41.5111506Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T09:33:41.5112803Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T09:33:41.5114304Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T09:33:41.5115997Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T09:33:41.5117240Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T09:33:41.5118553Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T09:33:41.5120431Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T09:33:41.5122004Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T09:33:41.5123454Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T09:33:41.5125146Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T09:33:41.5126390Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T09:33:41.5128182Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T09:33:41.5130073Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T09:33:41.5131318Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T09:33:41.5132556Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T09:33:41.5134355Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T09:33:41.5135549Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T09:33:41.5136934Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T09:33:41.5138733Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T09:33:41.5139940Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T09:33:41.5141226Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T09:33:41.5143145Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T09:33:41.5144459Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T09:33:41.5145738Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T09:33:41.5147581Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T09:33:41.5149173Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T09:33:41.5150464Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T09:33:41.5152681Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T09:33:41.5154111Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T09:33:41.5155404Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T09:33:41.5157204Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T09:33:41.5158985Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T09:33:41.5159798Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T09:33:41.5161645Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T09:33:41.5163058Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T09:33:41.5164274Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T09:33:41.5166222Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T09:33:41.5167420Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T09:33:41.5168687Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T09:33:41.5170507Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T09:33:41.5171800Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T09:33:41.5173089Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T09:33:41.5174656Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T09:33:41.5176016Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T09:33:41.5177306Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T09:33:41.5179091Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T09:33:41.5180396Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T09:33:41.5181681Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T09:33:41.5183534Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T09:33:41.5184815Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T09:33:41.5186409Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T09:33:41.5188115Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T09:33:41.5189384Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T09:33:41.5190692Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T09:33:41.5192262Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T09:33:41.5193586Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T09:33:41.5194769Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T09:33:41.5196873Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T09:33:41.5198165Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T09:33:41.5200180Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T09:33:41.5205025Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T09:33:41.5206517Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T09:33:41.5208218Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T09:33:41.5209538Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T09:33:41.5210888Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T09:33:41.5212782Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T09:33:41.5214103Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T09:33:41.5215450Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T09:33:41.5217127Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T09:33:41.5218504Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T09:33:41.5219792Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T09:33:41.5221734Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T09:33:41.5223031Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T09:33:41.5224301Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T09:33:41.5226210Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T09:33:41.5227534Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T09:33:41.5228901Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T09:33:41.5230623Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T09:33:41.5232015Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T09:33:41.5233374Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T09:33:41.5235114Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T09:33:41.5236603Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T09:33:41.5237840Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T09:33:41.5239603Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T09:33:41.5240894Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T09:33:41.5242246Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T09:33:41.5244212Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T09:33:41.5245589Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T09:33:41.5246937Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T09:33:41.5248792Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T09:33:41.5250045Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T09:33:41.5251350Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T09:33:41.5253311Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T09:33:41.5254737Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T09:33:41.5256019Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T09:33:41.5257975Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T09:33:41.5259169Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T09:33:41.5260916Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T09:33:41.5262206Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T09:33:41.5263374Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T09:33:41.5265227Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T09:33:41.5266619Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T09:33:41.5267920Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T09:33:41.5270072Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T09:33:41.5271622Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T09:33:41.5272929Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T09:33:41.5274912Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T09:33:41.5276488Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T09:33:41.5277786Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T09:33:41.5279560Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T09:33:41.5280890Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T09:33:41.5282168Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T09:33:41.5283972Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T09:33:41.5285219Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T09:33:41.5286520Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T09:33:41.5288577Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T09:33:41.5290002Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T09:33:41.5291535Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T09:33:41.5292821Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T09:33:41.5294688Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T09:33:41.5295793Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T09:33:41.5297487Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T09:33:41.5298735Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T09:33:41.5300992Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T09:33:41.5302439Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T09:33:41.5303858Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T09:33:41.5305459Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T09:33:41.5306661Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T09:33:41.5308007Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T09:33:41.5309761Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T09:33:41.5311033Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T09:33:41.5312341Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T09:33:41.5314017Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T09:33:41.5315315Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T09:33:41.5316570Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T09:33:41.5318241Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T09:33:41.5319469Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T09:33:41.5320776Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T09:33:41.5322460Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T09:33:41.5324277Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T09:33:41.5325635Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T09:33:41.5327465Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T09:33:41.5328760Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T09:33:41.5330539Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T09:33:41.5332265Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T09:33:41.5333597Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T09:33:41.5334860Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T09:33:41.5336583Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T09:33:41.5337852Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T09:33:41.5339203Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T09:33:41.5340896Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T09:33:41.5342136Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T09:33:41.5343491Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T09:33:41.5345255Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T09:33:41.5346527Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T09:33:41.5347793Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T09:33:41.5349459Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T09:33:41.5350761Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T09:33:41.5351994Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T09:33:41.5353686Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T09:33:41.5355018Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T09:33:41.5356218Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T09:33:41.5357845Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T09:33:41.5359147Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T09:33:41.5360425Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T09:33:41.5362238Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T09:33:41.5363588Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T09:33:41.5364882Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T09:33:41.5367080Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T09:33:41.5368371Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T09:33:41.5369755Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T09:33:41.5371497Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T09:33:41.5372799Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T09:33:41.5374112Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T09:33:41.5375840Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T09:33:41.5377110Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T09:33:41.5378375Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T09:33:41.5380171Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T09:33:41.5381743Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T09:33:41.5383046Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T09:33:41.5384814Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T09:33:41.5386058Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T09:33:41.5387385Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T09:33:41.5389072Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T09:33:41.5390384Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T09:33:41.5391676Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T09:33:41.5393360Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T09:33:41.5394649Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T09:33:41.5395950Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T09:33:41.5397795Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T09:33:41.5399026Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T09:33:41.5400377Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T09:33:41.5402428Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T09:33:41.5403799Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T09:33:41.5405098Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T09:33:41.5406815Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T09:33:41.5408173Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T09:33:41.5409325Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T09:33:41.5411088Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T09:33:41.5412574Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T09:33:41.5413612Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T09:33:41.5415460Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T09:33:41.5416722Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T09:33:41.5418218Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T09:33:41.5419924Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T09:33:41.5421231Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T09:33:41.5422496Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T09:33:41.5424259Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T09:33:41.5425549Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T09:33:41.5426848Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T09:33:41.5428503Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T09:33:41.5429759Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T09:33:41.5431095Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T09:33:41.5432868Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T09:33:41.5434144Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T09:33:41.5435433Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T09:33:41.5437128Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T09:33:41.5438406Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T09:33:41.5439676Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T09:33:41.5441398Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T09:33:41.5442747Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T09:33:41.5444093Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T09:33:41.5446080Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T09:33:41.5447336Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T09:33:41.5448751Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T09:33:41.5450465Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T09:33:41.5451781Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T09:33:41.5453032Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T09:33:41.5454699Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T09:33:41.5455951Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T09:33:41.5457233Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T09:33:41.5459025Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T09:33:41.5460223Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T09:33:41.5461575Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T09:33:41.5463278Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T09:33:41.5464570Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T09:33:41.5465909Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T09:33:41.5467610Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T09:33:41.5468866Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T09:33:41.5470181Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T09:33:41.5472201Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T09:33:41.5473457Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T09:33:41.5474729Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T09:33:41.5476529Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T09:33:41.5477806Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T09:33:41.5479061Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T09:33:41.5480766Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T09:33:41.5482131Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T09:33:41.5483548Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T09:33:41.5485265Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T09:33:41.5486575Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T09:33:41.5487832Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T09:33:41.5489632Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T09:33:41.5490828Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T09:33:41.5492119Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T09:33:41.5493953Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T09:33:41.5495203Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T09:33:41.5496500Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T09:33:41.5498226Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T09:33:41.5499586Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T09:33:41.5500997Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T09:33:41.5502929Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T09:33:41.5504164Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T09:33:41.5505439Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T09:33:41.5507133Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T09:33:41.5508505Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T09:33:41.5509802Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T09:33:41.5511489Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T09:33:41.5512701Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T09:33:41.5513933Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T09:33:41.5515474Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T09:33:41.5516874Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T09:33:41.5518126Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T09:33:41.5519850Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T09:33:41.5521202Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T09:33:41.5522538Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T09:33:41.5524343Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T09:33:41.5525778Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T09:33:41.5526998Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T09:33:41.5529014Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T09:33:41.5530903Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T09:33:41.5532251Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T09:33:41.5534109Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T09:33:41.5535374Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T09:33:41.5536715Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T09:33:41.5538404Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T09:33:41.5539680Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T09:33:41.5541119Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T09:33:41.5542823Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T09:33:41.5544081Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T09:33:41.5545327Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T09:33:41.5547342Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T09:33:41.5548604Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T09:33:41.5550897Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T09:33:41.5552176Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T09:33:41.5553515Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T09:33:41.5555256Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T09:33:41.5556503Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T09:33:41.5557799Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T09:33:41.5559442Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T09:33:41.5560748Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T09:33:41.5562139Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T09:33:41.5564015Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T09:33:41.5565163Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T09:33:41.5566396Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T09:33:41.5568624Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T09:33:41.5569838Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T09:33:41.5571108Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T09:33:41.5573167Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T09:33:41.5574179Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T09:33:41.5575499Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T09:33:41.5577186Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T09:33:41.5578588Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T09:33:41.5579896Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T09:33:41.5581546Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T09:33:41.5582885Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T09:33:41.5584212Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T09:33:41.5586120Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T09:33:41.5587307Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T09:33:41.5589044Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T09:33:41.5590787Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T09:33:41.5592055Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T09:33:41.5593341Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T09:33:41.5595019Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T09:33:41.5596329Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T09:33:41.5597639Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T09:33:41.5599835Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T09:33:41.5601350Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T09:33:41.5605201Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T09:33:41.5607069Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T09:33:41.5608259Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T09:33:41.5609869Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T09:33:41.5611963Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T09:33:41.5613260Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T09:33:41.5614632Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T09:33:41.5616322Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T09:33:41.5617619Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T09:33:41.5619008Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T09:33:41.5620764Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T09:33:41.5621968Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T09:33:41.5623969Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T09:33:41.5625592Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T09:33:41.5626904Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T09:33:41.5628204Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T09:33:41.5629869Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T09:33:41.5631149Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T09:33:41.5632452Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T09:33:41.5634126Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T09:33:41.5635414Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T09:33:41.5636683Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T09:33:41.5638387Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T09:33:41.5639748Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T09:33:41.5641069Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T09:33:41.5642867Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T09:33:41.5644154Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T09:33:41.5645462Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T09:33:41.5647672Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T09:33:41.5648947Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T09:33:41.5650208Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T09:33:41.5651972Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T09:33:41.5653394Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T09:33:41.5654606Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T09:33:41.5656304Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T09:33:41.5657723Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T09:33:41.5659003Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T09:33:41.5660600Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T09:33:41.5661862Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T09:33:41.5663205Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T09:33:41.5664928Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T09:33:41.5666236Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T09:33:41.5667504Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T09:33:41.5669220Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T09:33:41.5670522Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T09:33:41.5672277Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T09:33:41.5673972Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T09:33:41.5675328Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T09:33:41.5676618Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T09:33:41.5678307Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T09:33:41.5679554Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T09:33:41.5680845Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T09:33:41.5682590Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T09:33:41.5683961Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T09:33:41.5685223Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T09:33:41.5686972Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T09:33:41.5688454Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T09:33:41.5689742Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T09:33:41.5691393Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T09:33:41.5692780Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T09:33:41.5694086Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T09:33:41.5696191Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T09:33:41.5697574Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T09:33:41.5698850Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T09:33:41.5700569Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T09:33:41.5702193Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T09:33:41.5703452Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T09:33:41.5705204Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T09:33:41.5706490Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T09:33:41.5707808Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T09:33:41.5709549Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T09:33:41.5710921Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T09:33:41.5712181Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T09:33:41.5713895Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T09:33:41.5715185Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T09:33:41.5716466Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T09:33:41.5718207Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T09:33:41.5719458Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T09:33:41.5720718Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T09:33:41.5723319Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T09:33:41.5724818Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T09:33:41.5726020Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T09:33:41.5727764Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T09:33:41.5729156Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T09:33:41.5730413Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T09:33:41.5732144Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T09:33:41.5733424Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T09:33:41.5735171Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T09:33:41.5736437Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T09:33:41.5737716Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T09:33:41.5739461Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T09:33:41.5741270Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T09:33:41.5742601Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T09:33:41.5744566Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T09:33:41.5745891Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T09:33:41.5747240Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T09:33:41.5749134Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T09:33:41.5750473Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T09:33:41.5751754Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T09:33:41.5753566Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T09:33:41.5754854Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T09:33:41.5756131Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T09:33:41.5757992Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T09:33:41.5759257Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T09:33:41.5760537Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T09:33:41.5762356Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T09:33:41.5763713Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T09:33:41.5765115Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T09:33:41.5766884Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T09:33:41.5768174Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T09:33:41.5769427Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T09:33:41.5771233Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T09:33:41.5772523Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T09:33:41.5773791Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T09:33:41.5775609Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T09:33:41.5776919Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T09:33:41.5778136Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T09:33:41.5779826Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T09:33:41.5781177Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T09:33:41.5782544Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T09:33:41.5784287Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T09:33:41.5785668Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T09:33:41.5786977Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T09:33:41.5788657Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T09:33:41.5789991Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T09:33:41.5791652Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T09:33:41.5793406Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T09:33:41.5794655Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T09:33:41.5795957Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T09:33:41.5797727Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T09:33:41.5799022Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T09:33:41.5800413Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T09:33:41.5803087Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T09:33:41.5804389Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T09:33:41.5805651Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T09:33:41.5807431Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T09:33:41.5808686Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T09:33:41.5809991Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T09:33:41.5812233Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T09:33:41.5814026Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T09:33:41.5815359Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T09:33:41.5817366Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T09:33:41.5818472Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T09:33:41.5819712Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T09:33:41.5821467Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T09:33:41.5824557Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T09:33:41.5825688Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T09:33:41.5827425Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T09:33:41.5828692Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T09:33:41.5830011Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T09:33:41.5831778Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T09:33:41.5832926Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T09:33:41.5835126Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T09:33:41.5835809Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T09:33:41.5837309Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T09:33:41.5838684Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T09:33:41.5841021Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T09:33:41.5842249Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T09:33:41.5843535Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T09:33:41.5845318Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T09:33:41.5846528Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T09:33:41.5847828Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T09:33:41.5849547Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T09:33:41.5850877Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T09:33:41.5852283Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T09:33:41.5853998Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T09:33:41.5855566Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T09:33:41.5856880Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T09:33:41.5858591Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T09:33:41.5859852Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T09:33:41.5861171Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T09:33:41.5862841Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T09:33:41.5864100Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T09:33:41.5865355Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T09:33:41.5867060Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T09:33:41.5868321Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T09:33:41.5869600Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T09:33:41.5871327Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T09:33:41.5872674Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T09:33:41.5874010Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T09:33:41.5877379Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T09:33:41.5877676Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T09:33:41.5878520Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T09:33:41.5879970Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T09:33:41.5881131Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T09:33:41.5882527Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T09:33:41.5884971Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T09:33:41.5886234Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T09:33:41.5887608Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T09:33:41.5889414Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T09:33:41.5890802Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T09:33:41.5892079Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T09:33:41.5893823Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T09:33:41.5895118Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T09:33:41.5896412Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T09:33:41.5898206Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T09:33:41.5899524Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T09:33:41.5901126Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T09:33:41.5902926Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T09:33:41.5904172Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T09:33:41.5905655Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T09:33:41.5907601Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T09:33:41.5908714Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T09:33:41.5910113Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T09:33:41.5911792Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T09:33:41.5913126Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T09:33:41.5914399Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T09:33:41.5916161Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T09:33:41.5917430Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T09:33:41.5918710Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T09:33:41.5920496Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T09:33:41.5921707Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T09:33:41.5923133Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T09:33:41.5924899Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T09:33:41.5926397Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T09:33:41.5927631Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T09:33:41.5929503Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T09:33:41.5930754Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T09:33:41.5932011Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T09:33:41.5933826Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T09:33:41.5935096Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T09:33:41.5936373Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T09:33:41.5938132Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T09:33:41.5939424Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T09:33:41.5940795Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T09:33:41.5942524Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T09:33:41.5943922Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T09:33:41.5945206Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T09:33:41.5946992Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T09:33:41.5948257Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T09:33:41.5949553Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T09:33:41.5952174Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T09:33:41.5953977Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T09:33:41.5955202Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T09:33:41.5956463Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T09:33:41.5957800Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T09:33:41.5959432Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T09:33:41.5960840Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T09:33:41.5962331Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T09:33:41.5964045Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T09:33:41.5965347Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T09:33:41.5966487Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T09:33:41.5968458Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T09:33:41.5970012Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T09:33:41.5971671Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T09:33:41.5973391Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T09:33:41.5975119Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T09:33:41.5976764Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T09:33:41.5978839Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T09:33:41.5980121Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T09:33:41.5982250Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T09:33:41.5983460Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T09:33:41.5985763Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T09:33:41.5987035Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T09:33:41.5988340Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T09:33:41.5990588Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T09:33:41.5991843Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T09:33:41.5993434Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T09:33:41.5994684Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T09:33:41.5996403Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T09:33:41.5997652Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T09:33:41.5998982Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T09:33:41.6000650Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T09:33:41.6005104Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T09:33:41.6006410Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T09:33:41.6008450Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T09:33:41.6009902Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T09:33:41.6011163Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T09:33:41.6012841Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T09:33:41.6014077Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T09:33:41.6015365Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T09:33:41.6017070Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T09:33:41.6018357Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T09:33:41.6019653Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T09:33:41.6021326Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T09:33:41.6022613Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T09:33:41.6023853Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T09:33:41.6025589Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T09:33:41.6027174Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T09:33:41.6028537Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T09:33:41.6030361Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T09:33:41.6031679Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T09:33:41.6032942Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T09:33:41.6034703Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T09:33:41.6035974Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T09:33:41.6037660Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T09:33:41.6038779Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T09:33:41.6040275Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T09:33:41.6041504Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T09:33:41.6043250Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T09:33:41.6044462Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T09:33:41.6045953Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T09:33:41.6047181Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T09:33:41.6048704Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T09:33:41.6049932Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T09:33:41.6051462Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T09:33:41.6052681Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T09:33:41.6054245Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T09:33:41.6055510Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T09:33:41.6057036Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T09:33:41.6058442Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T09:33:41.6059845Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T09:33:41.6061058Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T09:33:41.6062594Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T09:33:41.6063759Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T09:33:41.6065287Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T09:33:41.6066557Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T09:33:41.6068820Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T09:33:41.6070097Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T09:33:41.6072026Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T09:33:41.6073266Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T09:33:41.6075531Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T09:33:41.6076880Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T09:33:41.6078145Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T09:33:41.6079724Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T09:33:41.6080970Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T09:33:41.6082290Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T09:33:41.6084385Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T09:33:41.6085685Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T09:33:41.6086961Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T09:33:41.6088664Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T09:33:41.6090022Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T09:33:41.6091364Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T09:33:41.6093692Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T09:33:41.6094960Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T09:33:41.6096538Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T09:33:41.6097840Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T09:33:41.6099494Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T09:33:41.6100944Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T09:33:41.6102651Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T09:33:41.6103856Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T09:33:41.6105547Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T09:33:41.6106914Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T09:33:41.6108316Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T09:33:41.6109998Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T09:33:41.6111316Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T09:33:41.6112591Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T09:33:41.6114316Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T09:33:41.6115591Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T09:33:41.6116838Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T09:33:41.6118758Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T09:33:41.6120061Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T09:33:41.6121311Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T09:33:41.6123183Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T09:33:41.6124448Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T09:33:41.6125823Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T09:33:41.6127519Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T09:33:41.6129287Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T09:33:41.6130985Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T09:33:41.6132745Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T09:33:41.6134069Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T09:33:41.6135382Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T09:33:41.6136948Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T09:33:41.6138265Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T09:33:41.6139550Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T09:33:41.6142248Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T09:33:41.6143714Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T09:33:41.6144989Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T09:33:41.6147257Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T09:33:41.6148818Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T09:33:41.6149881Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T09:33:41.6151603Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T09:33:41.6152857Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T09:33:41.6154118Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T09:33:41.6155705Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T09:33:41.6156995Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T09:33:41.6158228Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T09:33:41.6160111Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T09:33:41.6161590Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T09:33:41.6162784Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T09:33:41.6164976Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T09:33:41.6166189Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T09:33:41.6167838Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T09:33:41.6169227Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T09:33:41.6170494Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T09:33:41.6172140Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T09:33:41.6173365Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T09:33:41.6174639Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T09:33:41.6176305Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T09:33:41.6177504Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T09:33:41.6178866Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T09:33:41.6180530Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T09:33:41.6181786Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T09:33:41.6183015Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T09:33:41.6184718Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T09:33:41.6185960Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T09:33:41.6187217Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T09:33:41.6188908Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T09:33:41.6190159Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T09:33:41.6191421Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T09:33:41.6193074Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T09:33:41.6194535Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T09:33:41.6195862Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T09:33:41.6197617Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T09:33:41.6198892Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T09:33:41.6200150Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T09:33:41.6202072Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T09:33:41.6203621Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T09:33:41.6204705Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T09:33:41.6206397Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T09:33:41.6207688Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T09:33:41.6208962Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T09:33:41.6210609Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T09:33:41.6211897Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T09:33:41.6213287Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T09:33:41.6214992Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T09:33:41.6216230Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T09:33:41.6217472Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T09:33:41.6219181Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T09:33:41.6220437Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T09:33:41.6221666Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T09:33:41.6223345Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T09:33:41.6224593Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T09:33:41.6225867Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T09:33:41.6227505Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T09:33:41.6228762Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T09:33:41.6230375Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T09:33:41.6232281Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T09:33:41.6233541Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T09:33:41.6234826Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T09:33:41.6236591Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T09:33:41.6237843Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T09:33:41.6239129Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T09:33:41.6240883Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T09:33:41.6242242Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T09:33:41.6243553Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T09:33:41.6245278Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T09:33:41.6246662Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T09:33:41.6247962Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T09:33:41.6249802Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T09:33:41.6251083Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T09:33:41.6252354Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T09:33:41.6254063Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T09:33:41.6255297Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T09:33:41.6256565Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T09:33:41.6258778Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T09:33:41.6260087Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T09:33:41.6261314Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T09:33:41.6263021Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T09:33:41.6264322Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T09:33:41.6265707Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T09:33:41.6267310Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T09:33:41.6269037Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T09:33:41.6270328Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T09:33:41.6272137Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T09:33:41.6273402Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T09:33:41.6274671Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T09:33:41.6276416Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T09:33:41.6277718Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T09:33:41.6278981Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T09:33:41.6280972Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T09:33:41.6282334Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T09:33:41.6283795Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T09:33:41.6285607Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T09:33:41.6286841Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T09:33:41.6288126Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T09:33:41.6290242Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T09:33:41.6292034Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T09:33:41.6293315Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T09:33:41.6295071Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T09:33:41.6296277Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T09:33:41.6297586Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T09:33:41.6300217Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T09:33:41.6301566Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T09:33:41.6302974Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T09:33:41.6304957Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T09:33:41.6306426Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T09:33:41.6307750Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T09:33:41.6309373Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T09:33:41.6310667Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T09:33:41.6311930Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T09:33:41.6313686Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T09:33:41.6314899Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T09:33:41.6316182Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T09:33:41.6317963Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T09:33:41.6319221Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T09:33:41.6321030Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T09:33:41.6322543Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T09:33:41.6323866Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T09:33:41.6325618Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T09:33:41.6327295Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T09:33:41.6328566Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T09:33:41.6329821Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T09:33:41.6331606Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T09:33:41.6332836Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T09:33:41.6334148Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T09:33:41.6335905Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T09:33:41.6337290Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T09:33:41.6338512Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T09:33:41.6340585Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T09:33:41.6341887Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T09:33:41.6343137Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T09:33:41.6344805Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T09:33:41.6346035Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T09:33:41.6347314Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T09:33:41.6348958Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T09:33:41.6350193Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T09:33:41.6351839Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T09:33:41.6353433Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T09:33:41.6354740Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T09:33:41.6355984Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T09:33:41.6357639Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T09:33:41.6358917Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T09:33:41.6360201Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T09:33:41.6361877Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T09:33:41.6363244Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T09:33:41.6364517Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T09:33:41.6366149Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T09:33:41.6367391Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T09:33:41.6368663Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T09:33:41.6370673Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T09:33:41.6371918Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T09:33:41.6373886Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T09:33:41.6375178Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T09:33:41.6376455Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T09:33:41.6378139Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T09:33:41.6379415Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T09:33:41.6380674Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T09:33:41.6382992Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T09:33:41.6384482Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T09:33:41.6385773Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T09:33:41.6387943Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T09:33:41.6389318Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T09:33:41.6390686Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T09:33:41.6392857Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T09:33:41.6394213Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T09:33:41.6395500Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T09:33:41.6397269Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T09:33:41.6398652Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T09:33:41.6399948Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T09:33:41.6405130Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T09:33:41.6406720Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T09:33:41.6408170Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T09:33:41.6410080Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T09:33:41.6411422Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T09:33:41.6413128Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T09:33:41.6414797Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T09:33:41.6416117Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T09:33:41.6417393Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T09:33:41.6419006Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T09:33:41.6420202Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T09:33:41.6421460Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T09:33:41.6423327Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T09:33:41.6424904Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T09:33:41.6426575Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T09:33:41.6428389Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T09:33:41.6429725Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T09:33:41.6431004Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T09:33:41.6433648Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T09:33:41.6435477Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T09:33:41.6436828Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T09:33:41.6438638Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T09:33:41.6440135Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T09:33:41.6441564Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T09:33:41.6443591Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T09:33:41.6444913Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T09:33:41.6446175Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T09:33:41.6448065Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T09:33:41.6449570Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T09:33:41.6450838Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T09:33:41.6452595Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T09:33:41.6453898Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T09:33:41.6455174Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T09:33:41.6457460Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T09:33:41.6458868Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T09:33:41.6460152Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T09:33:41.6461756Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T09:33:41.6463354Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T09:33:41.6464521Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T09:33:41.6467909Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T09:33:41.6469931Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T09:33:41.6471772Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T09:33:41.6473907Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T09:33:41.6475187Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T09:33:41.6476498Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T09:33:41.6478721Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T09:33:41.6480007Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T09:33:41.6481304Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T09:33:41.6483615Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T09:33:41.6484901Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T09:33:41.6486139Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T09:33:41.6487907Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T09:33:41.6489141Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T09:33:41.6490413Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T09:33:41.6492133Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T09:33:41.6493390Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T09:33:41.6494646Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T09:33:41.6496319Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T09:33:41.6497605Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T09:33:41.6498884Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T09:33:41.6500726Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T09:33:41.6502310Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T09:33:41.6503669Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T09:33:41.6505455Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T09:33:41.6506672Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T09:33:41.6507938Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T09:33:41.6509609Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T09:33:41.6510927Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T09:33:41.6512313Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T09:33:41.6513928Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T09:33:41.6515199Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T09:33:41.6516590Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T09:33:41.6518803Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T09:33:41.6520226Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T09:33:41.6521588Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T09:33:41.6523482Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T09:33:41.6524774Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T09:33:41.6526545Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T09:33:41.6527875Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T09:33:41.6529193Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T09:33:41.6530876Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T09:33:41.6532184Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T09:33:41.6533973Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T09:33:41.6535766Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T09:33:41.6537018Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T09:33:41.6538856Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T09:33:41.6540114Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T09:33:41.6541395Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T09:33:41.6543064Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T09:33:41.6544389Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T09:33:41.6545647Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T09:33:41.6547545Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T09:33:41.6548787Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T09:33:41.6550043Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T09:33:41.6551805Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T09:33:41.6553058Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T09:33:41.6554329Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T09:33:41.6555938Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T09:33:41.6557304Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T09:33:41.6558530Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T09:33:41.6560224Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T09:33:41.6561501Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T09:33:41.6562894Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T09:33:41.6564596Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T09:33:41.6565796Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T09:33:41.6567094Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T09:33:41.6568959Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T09:33:41.6570107Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T09:33:41.6571381Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T09:33:41.6572987Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T09:33:41.6574278Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T09:33:41.6575520Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T09:33:41.6577193Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T09:33:41.6578428Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T09:33:41.6579963Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T09:33:41.6581412Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T09:33:41.6582623Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T09:33:41.6583923Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T09:33:41.6586209Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T09:33:41.6587609Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T09:33:41.6588889Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T09:33:41.6590748Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T09:33:41.6592174Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T09:33:41.6593404Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T09:33:41.6595131Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T09:33:41.6596408Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T09:33:41.6597635Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T09:33:41.6599466Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T09:33:41.6600807Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T09:33:41.6602465Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T09:33:41.6604433Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T09:33:41.6605857Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T09:33:41.6607122Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T09:33:41.6608928Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T09:33:41.6610237Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T09:33:41.6611553Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T09:33:41.6613391Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T09:33:41.6614857Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T09:33:41.6616138Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T09:33:41.6618458Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T09:33:41.6619895Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T09:33:41.6621324Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T09:33:41.6623064Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T09:33:41.6624353Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T09:33:41.6625610Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T09:33:41.6627380Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T09:33:41.6628639Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T09:33:41.6629909Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T09:33:41.6631675Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T09:33:41.6632970Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T09:33:41.6634264Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T09:33:41.6636113Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T09:33:41.6637521Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T09:33:41.6638877Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T09:33:41.6640797Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T09:33:41.6642138Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T09:33:41.6643584Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T09:33:41.6645364Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T09:33:41.6646669Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T09:33:41.6647955Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T09:33:41.6649705Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T09:33:41.6651037Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T09:33:41.6652336Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T09:33:41.6654063Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T09:33:41.6655517Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T09:33:41.6656765Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T09:33:41.6658634Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T09:33:41.6659973Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T09:33:41.6661330Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T09:33:41.6663064Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T09:33:41.6664284Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T09:33:41.6665602Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T09:33:41.6667320Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T09:33:41.6668661Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T09:33:41.6669933Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T09:33:41.6671533Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T09:33:41.6672895Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T09:33:41.6674133Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T09:33:41.6676022Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T09:33:41.6677347Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T09:33:41.6678631Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T09:33:41.6680618Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T09:33:41.6681904Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T09:33:41.6683420Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T09:33:41.6685132Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T09:33:41.6686394Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T09:33:41.6687674Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T09:33:41.6689448Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T09:33:41.6690972Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T09:33:41.6692534Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T09:33:41.6693850Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T09:33:41.6695539Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T09:33:41.6696805Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T09:33:41.6698608Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T09:33:41.6699831Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T09:33:41.6701347Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T09:33:41.6703534Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T09:33:41.6704758Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T09:33:41.6706007Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T09:33:41.6707759Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T09:33:41.6709029Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T09:33:41.6710248Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T09:33:41.6711986Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T09:33:41.6713349Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T09:33:41.6714644Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T09:33:41.6716722Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T09:33:41.6717949Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T09:33:41.6719784Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T09:33:41.6721028Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T09:33:41.6722329Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T09:33:41.6724077Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T09:33:41.6725477Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T09:33:41.6726763Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T09:33:41.6728703Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T09:33:41.6730029Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T09:33:41.6731307Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T09:33:41.6733087Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T09:33:41.6734453Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T09:33:41.6735794Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T09:33:41.6737591Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T09:33:41.6738826Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T09:33:41.6740031Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T09:33:41.6741860Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T09:33:41.6743178Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T09:33:41.6744561Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T09:33:41.6746592Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T09:33:41.6747899Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T09:33:41.6749221Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T09:33:41.6751007Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T09:33:41.6752336Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T09:33:41.6753718Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T09:33:41.6755449Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T09:33:41.6756722Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T09:33:41.6757967Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T09:33:41.6760010Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T09:33:41.6761401Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T09:33:41.6762773Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T09:33:41.6767048Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T09:33:41.6768578Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T09:33:41.6770642Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T09:33:41.6771907Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T09:33:41.6773229Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T09:33:41.6774920Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T09:33:41.6776288Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T09:33:41.6778051Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T09:33:41.6779820Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T09:33:41.6781182Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T09:33:41.6782398Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T09:33:41.6784391Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T09:33:41.6786078Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T09:33:41.6787567Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T09:33:41.6788913Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T09:33:41.6790549Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T09:33:41.6791856Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T09:33:41.6793118Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T09:33:41.6794835Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T09:33:41.6796213Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T09:33:41.6798672Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T09:33:41.6800017Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T09:33:41.6801349Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T09:33:41.6806146Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T09:33:41.6807406Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T09:33:41.6808853Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T09:33:41.6810657Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T09:33:41.6812584Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T09:33:41.6813783Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T09:33:41.6815441Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T09:33:41.6816798Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T09:33:41.6818166Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T09:33:41.6820218Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T09:33:41.6821593Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T09:33:41.6822892Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T09:33:41.6824597Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T09:33:41.6825844Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T09:33:41.6827107Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T09:33:41.6828773Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T09:33:41.6830226Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T09:33:41.6831602Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T09:33:41.6833707Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T09:33:41.6835042Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T09:33:41.6836399Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T09:33:41.6838112Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T09:33:41.6839469Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T09:33:41.6840624Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T09:33:41.6842379Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T09:33:41.6843732Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T09:33:41.6845011Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T09:33:41.6846684Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T09:33:41.6847937Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T09:33:41.6849395Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T09:33:41.6851166Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T09:33:41.6852435Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T09:33:41.6853820Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T09:33:41.6855455Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T09:33:41.6856723Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T09:33:41.6858004Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T09:33:41.6860193Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T09:33:41.6861464Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T09:33:41.6862803Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T09:33:41.6864499Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T09:33:41.6865810Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T09:33:41.6867060Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T09:33:41.6868828Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T09:33:41.6870066Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T09:33:41.6871462Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T09:33:41.6873276Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T09:33:41.6874552Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T09:33:41.6876290Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T09:33:41.6877936Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T09:33:41.6879210Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T09:33:41.6880569Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T09:33:41.6882260Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T09:33:41.6883601Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T09:33:41.6884876Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T09:33:41.6887056Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T09:33:41.6888320Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T09:33:41.6889704Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T09:33:41.6891437Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T09:33:41.6892743Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T09:33:41.6893992Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T09:33:41.6895700Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T09:33:41.6896999Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T09:33:41.6898250Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T09:33:41.6899942Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T09:33:41.6902017Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T09:33:41.6903249Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T09:33:41.6905236Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T09:33:41.6906529Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T09:33:41.6907883Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T09:33:41.6909710Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T09:33:41.6910951Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T09:33:41.6912199Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T09:33:41.6913858Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T09:33:41.6915051Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T09:33:41.6916328Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T09:33:41.6918077Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T09:33:41.6919322Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T09:33:41.6920590Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T09:33:41.6922392Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T09:33:41.6923787Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T09:33:41.6925272Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T09:33:41.6927016Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T09:33:41.6928388Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T09:33:41.6929671Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T09:33:41.6931396Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T09:33:41.6932693Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T09:33:41.6933987Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T09:33:41.6935752Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T09:33:41.6937027Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T09:33:41.6938324Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T09:33:41.6940569Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T09:33:41.6941828Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T09:33:41.6943258Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T09:33:41.6945139Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T09:33:41.6946350Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T09:33:41.6947734Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T09:33:41.6949443Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T09:33:41.6950693Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T09:33:41.6951964Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T09:33:41.6953546Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T09:33:41.6954807Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T09:33:41.6956149Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T09:33:41.6957952Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T09:33:41.6959206Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T09:33:41.6961545Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T09:33:41.6962988Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T09:33:41.6964298Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T09:33:41.6966524Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T09:33:41.6968628Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T09:33:41.6969901Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T09:33:41.6971172Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T09:33:41.6973515Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T09:33:41.6974908Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T09:33:41.6976459Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T09:33:41.6977758Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T09:33:41.6979278Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T09:33:41.6980578Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T09:33:41.6982106Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T09:33:41.6983333Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T09:33:41.6984822Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T09:33:41.6986044Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T09:33:41.6987946Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T09:33:41.6989161Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T09:33:41.6990823Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T09:33:41.6992071Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T09:33:41.6994254Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T09:33:41.6995560Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T09:33:41.6997202Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T09:33:41.6998470Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T09:33:41.7000023Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T09:33:41.7001419Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T09:33:41.7003593Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T09:33:41.7004794Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T09:33:41.7006531Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T09:33:41.7007814Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T09:33:41.7009536Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T09:33:41.7010814Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T09:33:41.7012058Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T09:33:41.7013929Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T09:33:41.7015179Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T09:33:41.7016418Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T09:33:41.7018351Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T09:33:41.7019612Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T09:33:41.7020973Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T09:33:41.7022861Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T09:33:41.7024106Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T09:33:41.7025406Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T09:33:41.7027266Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T09:33:41.7028547Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T09:33:41.7029831Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T09:33:41.7031687Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T09:33:41.7032926Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T09:33:41.7034129Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T09:33:41.7035922Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T09:33:41.7037169Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T09:33:41.7038482Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T09:33:41.7040775Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T09:33:41.7042167Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T09:33:41.7043669Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T09:33:41.7045549Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T09:33:41.7047005Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T09:33:41.7048392Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T09:33:41.7050220Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T09:33:41.7051843Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T09:33:41.7053106Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T09:33:41.7054645Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T09:33:41.7055933Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T09:33:41.7057284Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T09:33:41.7059506Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T09:33:41.7060873Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T09:33:41.7062156Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T09:33:41.7064296Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T09:33:41.7065581Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T09:33:41.7066892Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T09:33:41.7068823Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T09:33:41.7070163Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T09:33:41.7071470Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T09:33:41.7073268Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T09:33:41.7074619Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T09:33:41.7075911Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T09:33:41.7078263Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T09:33:41.7079581Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T09:33:41.7080830Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T09:33:41.7082736Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T09:33:41.7084338Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T09:33:41.7085607Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T09:33:41.7087704Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T09:33:41.7089128Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T09:33:41.7090540Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T09:33:41.7093210Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T09:33:41.7094567Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T09:33:41.7095884Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T09:33:41.7097949Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T09:33:41.7099276Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T09:33:41.7100709Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T09:33:41.7102811Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T09:33:41.7104018Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T09:33:41.7105332Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T09:33:41.7107131Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T09:33:41.7108408Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T09:33:41.7109666Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T09:33:41.7111526Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T09:33:41.7112931Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T09:33:41.7114324Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T09:33:41.7116207Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T09:33:41.7117560Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T09:33:41.7118832Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T09:33:41.7120647Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T09:33:41.7121972Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T09:33:41.7123354Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T09:33:41.7125153Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T09:33:41.7126401Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T09:33:41.7127619Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T09:33:41.7129499Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T09:33:41.7130764Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T09:33:41.7132029Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T09:33:41.7133799Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T09:33:41.7135119Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T09:33:41.7136409Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T09:33:41.7138142Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T09:33:41.7139429Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T09:33:41.7140747Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T09:33:41.7142449Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T09:33:41.7143816Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T09:33:41.7145094Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T09:33:41.7146950Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T09:33:41.7148375Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T09:33:41.7149609Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T09:33:41.7151448Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T09:33:41.7152827Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T09:33:41.7154212Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T09:33:41.7155939Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T09:33:41.7157291Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T09:33:41.7158598Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T09:33:41.7160349Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T09:33:41.7161619Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T09:33:41.7163047Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T09:33:41.7164665Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T09:33:41.7165914Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T09:33:41.7167173Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T09:33:41.7168839Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T09:33:41.7170141Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T09:33:41.7171399Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T09:33:41.7172950Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T09:33:41.7174259Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T09:33:41.7175487Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T09:33:41.7177337Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T09:33:41.7178693Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T09:33:41.7179983Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T09:33:41.7181684Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T09:33:41.7182967Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T09:33:41.7184232Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T09:33:41.7186053Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T09:33:41.7187430Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T09:33:41.7188705Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T09:33:41.7190618Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T09:33:41.7191834Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T09:33:41.7193330Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T09:33:41.7195301Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T09:33:41.7197027Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T09:33:41.7198405Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T09:33:41.7199952Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T09:33:41.7204082Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T09:33:41.7205904Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T09:33:41.7207791Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T09:33:41.7209070Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T09:33:41.7210347Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T09:33:41.7212753Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T09:33:41.7214011Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T09:33:41.7215268Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T09:33:41.7217677Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T09:33:41.7219096Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T09:33:41.7220414Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T09:33:41.7222373Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T09:33:41.7223701Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T09:33:41.7224998Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T09:33:41.7227010Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T09:33:41.7228296Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T09:33:41.7229592Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T09:33:41.7231392Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T09:33:41.7232713Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T09:33:41.7233988Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T09:33:41.7235495Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T09:33:41.7236876Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T09:33:41.7238133Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T09:33:41.7239673Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T09:33:41.7240949Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T09:33:41.7242282Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T09:33:41.7244039Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T09:33:41.7245301Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T09:33:41.7246546Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T09:33:41.7248135Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T09:33:41.7249589Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T09:33:41.7250783Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T09:33:41.7252439Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T09:33:41.7253703Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T09:33:41.7255685Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T09:33:41.7257162Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T09:33:41.7258405Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T09:33:41.7259659Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T09:33:41.7261429Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T09:33:41.7262456Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T09:33:41.7263707Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T09:33:41.7265389Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T09:33:41.7266710Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T09:33:41.7267996Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T09:33:41.7269692Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T09:33:41.7271029Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T09:33:41.7272243Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T09:33:41.7274470Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T09:33:41.7275743Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T09:33:41.7277031Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T09:33:41.7278628Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T09:33:41.7279928Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T09:33:41.7281205Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T09:33:41.7283060Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T09:33:41.7284415Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T09:33:41.7285654Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T09:33:41.7287313Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T09:33:41.7288584Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T09:33:41.7289759Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T09:33:41.7291496Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T09:33:41.7292796Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T09:33:41.7294077Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T09:33:41.7295792Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T09:33:41.7297057Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T09:33:41.7298264Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T09:33:41.7301649Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T09:33:41.7302746Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T09:33:41.7303197Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T09:33:41.7305412Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T09:33:41.7306682Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T09:33:41.7308020Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T09:33:41.7309753Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T09:33:41.7311021Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T09:33:41.7312869Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T09:33:41.7314690Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T09:33:41.7316031Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T09:33:41.7317314Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T09:33:41.7319036Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T09:33:41.7320497Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T09:33:41.7321731Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T09:33:41.7323611Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T09:33:41.7324864Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T09:33:41.7326122Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T09:33:41.7327861Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T09:33:41.7329116Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T09:33:41.7330386Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T09:33:41.7332062Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T09:33:41.7333388Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T09:33:41.7334602Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T09:33:41.7336316Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T09:33:41.7337673Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T09:33:41.7338963Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T09:33:41.7340697Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T09:33:41.7341966Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T09:33:41.7343204Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T09:33:41.7353059Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T09:33:41.7353377Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T09:33:41.7353632Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T09:33:41.7353868Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T09:33:41.7354101Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T09:33:41.7354350Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T09:33:41.7354583Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T09:33:41.7354955Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T09:33:41.7356042Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T09:33:41.7357802Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T09:33:41.7359100Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T09:33:41.7360347Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T09:33:41.7362508Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T09:33:41.7363958Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T09:33:41.7366065Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T09:33:41.7367542Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T09:33:41.7369356Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T09:33:41.7371640Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T09:33:41.7372939Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T09:33:41.7374372Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T09:33:41.7376026Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T09:33:41.7377345Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T09:33:41.7378668Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T09:33:41.7380258Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T09:33:41.7381541Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T09:33:41.7382863Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T09:33:41.7384556Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T09:33:41.7385790Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T09:33:41.7387311Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T09:33:41.7389095Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T09:33:41.7390384Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T09:33:41.7391870Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T09:33:41.7393543Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T09:33:41.7394863Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T09:33:41.7396061Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T09:33:41.7397728Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T09:33:41.7399006Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T09:33:41.7400196Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T09:33:41.7402074Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T09:33:41.7403523Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T09:33:41.7405229Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T09:33:41.7407187Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T09:33:41.7408346Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T09:33:41.7409657Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T09:33:41.7411612Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T09:33:41.7412957Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T09:33:41.7414214Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T09:33:41.7415982Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T09:33:41.7417253Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T09:33:41.7418499Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T09:33:41.7420098Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T09:33:41.7421451Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T09:33:41.7422803Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T09:33:41.7424962Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T09:33:41.7426265Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T09:33:41.7427530Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T09:33:41.7429219Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T09:33:41.7430550Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T09:33:41.7431838Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T09:33:41.7433627Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T09:33:41.7434870Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T09:33:41.7436125Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T09:33:41.7437680Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T09:33:41.7438959Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T09:33:41.7440316Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T09:33:41.7442002Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T09:33:41.7443408Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T09:33:41.7444645Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T09:33:41.7446408Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T09:33:41.7447680Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T09:33:41.7448927Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T09:33:41.7450595Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T09:33:41.7451866Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T09:33:41.7453157Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T09:33:41.7454861Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T09:33:41.7456197Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T09:33:41.7457666Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T09:33:41.7459235Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T09:33:41.7460555Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T09:33:41.7461810Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T09:33:41.7463491Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T09:33:41.7464751Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T09:33:41.7466022Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T09:33:41.7468235Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T09:33:41.7469514Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T09:33:41.7470779Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T09:33:41.7472751Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T09:33:41.7474122Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T09:33:41.7475392Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T09:33:41.7477045Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T09:33:41.7478314Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T09:33:41.7479580Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T09:33:41.7481226Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T09:33:41.7482494Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T09:33:41.7483836Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T09:33:41.7485456Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T09:33:41.7486691Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T09:33:41.7487930Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T09:33:41.7489672Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T09:33:41.7491054Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T09:33:41.7492332Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T09:33:41.7494033Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T09:33:41.7495283Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T09:33:41.7497237Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T09:33:41.7498641Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T09:33:41.7499843Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T09:33:41.7501316Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T09:33:41.7503052Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T09:33:41.7504314Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T09:33:41.7505635Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T09:33:41.7507412Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T09:33:41.7508753Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T09:33:41.7510162Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T09:33:41.7511661Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T09:33:41.7512945Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T09:33:41.7514183Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T09:33:41.7515916Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T09:33:41.7517167Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T09:33:41.7518482Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T09:33:41.7520404Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T09:33:41.7521246Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T09:33:41.7523153Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T09:33:41.7524865Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T09:33:41.7526199Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T09:33:41.7527459Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T09:33:41.7529138Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T09:33:41.7530359Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T09:33:41.7531609Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T09:33:41.7533202Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T09:33:41.7534470Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T09:33:41.7535739Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T09:33:41.7537374Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T09:33:41.7538651Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T09:33:41.7539914Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T09:33:41.7541575Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T09:33:41.7542924Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T09:33:41.7544272Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T09:33:41.7545991Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T09:33:41.7547237Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T09:33:41.7548487Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T09:33:41.7550578Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T09:33:41.7551867Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T09:33:41.7553119Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T09:33:41.7554877Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T09:33:41.7556176Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T09:33:41.7557409Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T09:33:41.7559031Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T09:33:41.7560391Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T09:33:41.7561732Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T09:33:41.7563579Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T09:33:41.7565711Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T09:33:41.7567076Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T09:33:41.7568501Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T09:33:41.7570151Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T09:33:41.7571444Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T09:33:41.7572735Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T09:33:41.7574496Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T09:33:41.7575830Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T09:33:41.7577223Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T09:33:41.7578902Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T09:33:41.7580171Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T09:33:41.7581472Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T09:33:41.7583215Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T09:33:41.7584555Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T09:33:41.7585873Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T09:33:41.7587463Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T09:33:41.7588727Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T09:33:41.7589999Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T09:33:41.7591589Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T09:33:41.7592822Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T09:33:41.7594204Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T09:33:41.7595902Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T09:33:41.7597170Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T09:33:41.7598469Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T09:33:41.7600182Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T09:33:41.7601694Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T09:33:41.7603463Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T09:33:41.7605447Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T09:33:41.7606699Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T09:33:41.7608493Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T09:33:41.7610201Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T09:33:41.7611456Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T09:33:41.7613310Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T09:33:41.7615039Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T09:33:41.7616430Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T09:33:41.7617620Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T09:33:41.7619318Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T09:33:41.7620664Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T09:33:41.7621921Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T09:33:41.7623598Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T09:33:41.7624865Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T09:33:41.7626226Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T09:33:41.7627899Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T09:33:41.7629401Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T09:33:41.7630784Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T09:33:41.7632492Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T09:33:41.7634250Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T09:33:41.7635499Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T09:33:41.7637671Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T09:33:41.7638971Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T09:33:41.7640209Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T09:33:41.7641997Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T09:33:41.7643450Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T09:33:41.7644793Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T09:33:41.7646978Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T09:33:41.7648376Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T09:33:41.7649632Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T09:33:41.7651398Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T09:33:41.7652647Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T09:33:41.7653911Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T09:33:41.7656103Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T09:33:41.7657366Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T09:33:41.7658662Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T09:33:41.7660386Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T09:33:41.7661697Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T09:33:41.7662944Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T09:33:41.7664701Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T09:33:41.7666030Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T09:33:41.7667381Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T09:33:41.7669616Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T09:33:41.7671173Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T09:33:41.7672421Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T09:33:41.7674351Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T09:33:41.7675645Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T09:33:41.7677015Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T09:33:41.7679014Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T09:33:41.7680287Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T09:33:41.7682020Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T09:33:41.7683444Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T09:33:41.7684861Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T09:33:41.7686549Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T09:33:41.7688047Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T09:33:41.7689298Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T09:33:41.7691162Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T09:33:41.7692436Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T09:33:41.7693760Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T09:33:41.7695634Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T09:33:41.7696891Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T09:33:41.7698175Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T09:33:41.7699694Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T09:33:41.7701128Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T09:33:41.7702626Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T09:33:41.7704139Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T09:33:41.7705410Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T09:33:41.7706660Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T09:33:41.7708723Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T09:33:41.7710285Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T09:33:41.7711909Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T09:33:41.7713607Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T09:33:41.7714932Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T09:33:41.7716317Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T09:33:41.7718294Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T09:33:41.7719670Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T09:33:41.7721363Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T09:33:41.7722707Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T09:33:41.7724026Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T09:33:41.7726237Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T09:33:41.7727416Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T09:33:41.7728680Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T09:33:41.7730334Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T09:33:41.7731579Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T09:33:41.7732843Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T09:33:41.7734525Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T09:33:41.7735735Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T09:33:41.7737131Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T09:33:41.7738820Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T09:33:41.7740114Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T09:33:41.7741356Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T09:33:41.7743059Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T09:33:41.7744312Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T09:33:41.7745586Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T09:33:41.7747228Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T09:33:41.7748448Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T09:33:41.7749730Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T09:33:41.7751930Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T09:33:41.7753188Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T09:33:41.7754653Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T09:33:41.7756305Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T09:33:41.7757570Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T09:33:41.7758876Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T09:33:41.7760485Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T09:33:41.7761757Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T09:33:41.7763061Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T09:33:41.7764835Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T09:33:41.7766058Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T09:33:41.7767402Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T09:33:41.7769025Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T09:33:41.7770295Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T09:33:41.7771675Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T09:33:41.7773307Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T09:33:41.7774558Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T09:33:41.7775811Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T09:33:41.7777513Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T09:33:41.7778673Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T09:33:41.7779978Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T09:33:41.7781608Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T09:33:41.7782847Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T09:33:41.7784069Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T09:33:41.7786168Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T09:33:41.7787535Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T09:33:41.7788817Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T09:33:41.7791020Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T09:33:41.7792274Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T09:33:41.7793557Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T09:33:41.7795235Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T09:33:41.7796513Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T09:33:41.7797837Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T09:33:41.7799521Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T09:33:41.7800776Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T09:33:41.7802357Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T09:33:41.7804181Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T09:33:41.7805523Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T09:33:41.7806813Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T09:33:41.7808486Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T09:33:41.7809728Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T09:33:41.7811004Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T09:33:41.7812675Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T09:33:41.7813973Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T09:33:41.7815201Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T09:33:41.7817271Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T09:33:41.7818576Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T09:33:41.7820279Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T09:33:41.7821565Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T09:33:41.7823274Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T09:33:41.7824557Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T09:33:41.7825838Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T09:33:41.7827410Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T09:33:41.7828788Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T09:33:41.7829946Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T09:33:41.7831598Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T09:33:41.7832863Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T09:33:41.7834120Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T09:33:41.7835763Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T09:33:41.7837111Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T09:33:41.7838483Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T09:33:41.7840174Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T09:33:41.7841457Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T09:33:41.7843338Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T09:33:41.7844934Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T09:33:41.7846200Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T09:33:41.7847856Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T09:33:41.7850008Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T09:33:41.7851230Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T09:33:41.7852555Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T09:33:41.7854263Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T09:33:41.7855558Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T09:33:41.7857191Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T09:33:41.7858826Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T09:33:41.7860026Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T09:33:41.7861691Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T09:33:41.7863417Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T09:33:41.7864690Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T09:33:41.7866371Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T09:33:41.7867610Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T09:33:41.7868976Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T09:33:41.7870697Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T09:33:41.7871915Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T09:33:41.7873173Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T09:33:41.7874924Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T09:33:41.7876185Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T09:33:41.7877443Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T09:33:41.7879024Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T09:33:41.7880307Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T09:33:41.7881700Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T09:33:41.7883547Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T09:33:41.7884859Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T09:33:41.7886155Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T09:33:41.7887838Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T09:33:41.7889164Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T09:33:41.7891402Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T09:33:41.7892698Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T09:33:41.7893983Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T09:33:41.7895669Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T09:33:41.7896916Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T09:33:41.7898353Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T09:33:41.7900055Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T09:33:41.7901587Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T09:33:41.7902917Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T09:33:41.7904592Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T09:33:41.7905839Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T09:33:41.7907142Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T09:33:41.7908966Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T09:33:41.7910290Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T09:33:41.7911561Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T09:33:41.7913131Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T09:33:41.7914436Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T09:33:41.7915701Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T09:33:41.7917395Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T09:33:41.7918726Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T09:33:41.7920012Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T09:33:41.7922672Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T09:33:41.7923664Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T09:33:41.7925161Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T09:33:41.7926825Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T09:33:41.7927978Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T09:33:41.7929327Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T09:33:41.7931109Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T09:33:41.7932283Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T09:33:41.7933676Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T09:33:41.7935325Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T09:33:41.7936407Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T09:33:41.7937666Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T09:33:41.7939362Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T09:33:41.7940594Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T09:33:41.7942085Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T09:33:41.7943726Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T09:33:41.7944896Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T09:33:41.7946165Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T09:33:41.7947906Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T09:33:41.7949064Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T09:33:41.7950344Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T09:33:41.7952163Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T09:33:41.7953439Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T09:33:41.7954718Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T09:33:41.7956536Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T09:33:41.7957729Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T09:33:41.7959201Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T09:33:41.7960877Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T09:33:41.7962021Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T09:33:41.7963564Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T09:33:41.7965234Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T09:33:41.7966421Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T09:33:41.7967745Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T09:33:41.7969597Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T09:33:41.7970778Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T09:33:41.7972117Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T09:33:41.7974015Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T09:33:41.7975077Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T09:33:41.7976552Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T09:33:41.7978928Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T09:33:41.7980252Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T09:33:41.7981561Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T09:33:41.7984263Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T09:33:41.7985806Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T09:33:41.7987021Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T09:33:41.7988975Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T09:33:41.7990233Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T09:33:41.7991668Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T09:33:41.7993564Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T09:33:41.7994826Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T09:33:41.7996033Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T09:33:41.7997878Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T09:33:41.7999076Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T09:33:41.8000327Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T09:33:41.8005058Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T09:33:41.8006316Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T09:33:41.8007643Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T09:33:41.8009652Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T09:33:41.8010844Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T09:33:41.8012154Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T09:33:41.8013870Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T09:33:41.8015175Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T09:33:41.8016432Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T09:33:41.8018391Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T09:33:41.8019720Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T09:33:41.8021000Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T09:33:41.8022963Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T09:33:41.8024211Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T09:33:41.8025514Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T09:33:41.8027415Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T09:33:41.8028653Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T09:33:41.8029988Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T09:33:41.8031878Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T09:33:41.8033416Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T09:33:41.8034625Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T09:33:41.8036525Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T09:33:41.8037859Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T09:33:41.8039037Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T09:33:41.8040911Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T09:33:41.8042042Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T09:33:41.8043528Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T09:33:41.8045255Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T09:33:41.8046762Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T09:33:41.8048553Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T09:33:41.8050560Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T09:33:41.8052072Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T09:33:41.8053297Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T09:33:41.8055783Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T09:33:41.8057028Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T09:33:41.8058457Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T09:33:41.8060300Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T09:33:41.8061500Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T09:33:41.8062793Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T09:33:41.8064938Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T09:33:41.8066126Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T09:33:41.8067829Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T09:33:41.8068987Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T09:33:41.8070616Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T09:33:41.8072423Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T09:33:41.8074029Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T09:33:41.8075239Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T09:33:41.8077366Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T09:33:41.8078567Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T09:33:41.8080097Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T09:33:41.8081849Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T09:33:41.8083165Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T09:33:41.8084672Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T09:33:41.8086499Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T09:33:41.8087674Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T09:33:41.8089452Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T09:33:41.8091345Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T09:33:41.8092648Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T09:33:41.8093871Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T09:33:41.8095727Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T09:33:41.8096905Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T09:33:41.8098177Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T09:33:41.8099968Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T09:33:41.8101449Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T09:33:41.8103484Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T09:33:41.8105354Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T09:33:41.8106588Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T09:33:41.8108207Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T09:33:41.8109320Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T09:33:41.8111787Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T09:33:41.8112896Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T09:33:41.8114187Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T09:33:41.8116097Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T09:33:41.8117308Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T09:33:41.8118574Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T09:33:41.8120875Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T09:33:41.8122067Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T09:33:41.8123840Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T09:33:41.8125713Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T09:33:41.8126957Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T09:33:41.8128241Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T09:33:41.8130086Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T09:33:41.8131356Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T09:33:41.8132693Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T09:33:41.8134588Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T09:33:41.8135858Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T09:33:41.8137120Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T09:33:41.8139177Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T09:33:41.8140486Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T09:33:41.8141733Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T09:33:41.8143558Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T09:33:41.8144722Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T09:33:41.8146169Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T09:33:41.8147933Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T09:33:41.8149113Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T09:33:41.8150390Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T09:33:41.8152361Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T09:33:41.8153490Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T09:33:41.8154762Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T09:33:41.8156787Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T09:33:41.8157936Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T09:33:41.8159264Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T09:33:41.8161081Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T09:33:41.8162272Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T09:33:41.8163662Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T09:33:41.8165783Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T09:33:41.8167522Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T09:33:41.8168767Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T09:33:41.8170501Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T09:33:41.8171633Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T09:33:41.8172891Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T09:33:41.8174851Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T09:33:41.8176021Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T09:33:41.8177284Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T09:33:41.8179104Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T09:33:41.8180389Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T09:33:41.8181692Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T09:33:41.8184186Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T09:33:41.8186002Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T09:33:41.8187176Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T09:33:41.8189652Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T09:33:41.8190866Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T09:33:41.8192179Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T09:33:41.8194137Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T09:33:41.8195344Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T09:33:41.8196586Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T09:33:41.8198492Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T09:33:41.8199666Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T09:33:41.8201003Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T09:33:41.8202990Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T09:33:41.8204243Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T09:33:41.8205546Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T09:33:41.8207352Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T09:33:41.8208578Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T09:33:41.8209862Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T09:33:41.8211842Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T09:33:41.8213000Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T09:33:41.8214253Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T09:33:41.8216091Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T09:33:41.8217258Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T09:33:41.8218491Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T09:33:41.8220297Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T09:33:41.8221466Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T09:33:41.8222751Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T09:33:41.8224587Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T09:33:41.8225758Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T09:33:41.8227028Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T09:33:41.8229021Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T09:33:41.8230184Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T09:33:41.8231460Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T09:33:41.8233261Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T09:33:41.8234422Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T09:33:41.8235693Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T09:33:41.8237494Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T09:33:41.8238654Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T09:33:41.8239897Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T09:33:41.8242630Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T09:33:41.8245197Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T09:33:41.8246406Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T09:33:41.8247872Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T09:33:41.8249614Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T09:33:41.8250942Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T09:33:41.8252091Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T09:33:41.8253931Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T09:33:41.8255020Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T09:33:41.8256334Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T09:33:41.8258211Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T09:33:41.8259408Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T09:33:41.8260657Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T09:33:41.8262439Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T09:33:41.8263707Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T09:33:41.8265126Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T09:33:41.8266893Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T09:33:41.8268054Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T09:33:41.8269337Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T09:33:41.8271094Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T09:33:41.8272265Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T09:33:41.8273770Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T09:33:41.8275985Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T09:33:41.8277271Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T09:33:41.8278549Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T09:33:41.8280423Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T09:33:41.8281923Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T09:33:41.8283176Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T09:33:41.8285131Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T09:33:41.8286955Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T09:33:41.8288130Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T09:33:41.8290731Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T09:33:41.8291980Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T09:33:41.8293435Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T09:33:41.8295218Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T09:33:41.8296356Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T09:33:41.8297575Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T09:33:41.8299577Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T09:33:41.8301062Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T09:33:41.8302674Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T09:33:41.8304503Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T09:33:41.8305576Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T09:33:41.8306992Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T09:33:41.8308764Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T09:33:41.8310273Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T09:33:41.8311496Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T09:33:41.8313903Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T09:33:41.8315099Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T09:33:41.8316515Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T09:33:41.8318298Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T09:33:41.8319727Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T09:33:41.8320923Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T09:33:41.8323246Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T09:33:41.8324419Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T09:33:41.8325701Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T09:33:41.8327590Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T09:33:41.8328790Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T09:33:41.8330655Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T09:33:41.8332551Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T09:33:41.8333702Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T09:33:41.8334967Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T09:33:41.8336896Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T09:33:41.8338385Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T09:33:41.8339918Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T09:33:41.8342053Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T09:33:41.8343899Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T09:33:41.8345093Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T09:33:41.8347306Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T09:33:41.8348485Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T09:33:41.8349766Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T09:33:41.8351602Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T09:33:41.8352786Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T09:33:41.8354670Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T09:33:41.8355828Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T09:33:41.8357106Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T09:33:41.8359790Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T09:33:41.8360935Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T09:33:41.8362285Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T09:33:41.8364148Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T09:33:41.8365327Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T09:33:41.8366579Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T09:33:41.8368599Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T09:33:41.8369681Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T09:33:41.8370973Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T09:33:41.8372984Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T09:33:41.8374185Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T09:33:41.8375493Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T09:33:41.8377685Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T09:33:41.8378826Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T09:33:41.8380061Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T09:33:41.8381923Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T09:33:41.8383117Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T09:33:41.8384387Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T09:33:41.8386320Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T09:33:41.8387627Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T09:33:41.8388856Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T09:33:41.8390822Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T09:33:41.8391959Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T09:33:41.8393253Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T09:33:41.8395096Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T09:33:41.8396291Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T09:33:41.8397550Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T09:33:41.8399314Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T09:33:41.8400525Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T09:33:41.8404499Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T09:33:41.8406076Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T09:33:41.8407255Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T09:33:41.8408526Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T09:33:41.8410492Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T09:33:41.8411809Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T09:33:41.8412972Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T09:33:41.8414614Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T09:33:41.8415854Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T09:33:41.8417127Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T09:33:41.8419055Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T09:33:41.8420195Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T09:33:41.8421486Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T09:33:41.8423442Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T09:33:41.8424772Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T09:33:41.8426039Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T09:33:41.8428139Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T09:33:41.8429393Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T09:33:41.8430660Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T09:33:41.8432388Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T09:33:41.8433566Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T09:33:41.8434812Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T09:33:41.8436480Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T09:33:41.8437660Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T09:33:41.8438973Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T09:33:41.8441284Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T09:33:41.8442459Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T09:33:41.8443828Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T09:33:41.8445896Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T09:33:41.8447117Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T09:33:41.8448424Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T09:33:41.8450325Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T09:33:41.8451509Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T09:33:41.8452774Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T09:33:41.8454832Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T09:33:41.8456029Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T09:33:41.8457303Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T09:33:41.8459181Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T09:33:41.8460433Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T09:33:41.8461843Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T09:33:41.8464090Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T09:33:41.8466000Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T09:33:41.8467240Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T09:33:41.8469303Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T09:33:41.8470648Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T09:33:41.8472023Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T09:33:41.8474043Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T09:33:41.8475298Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T09:33:41.8476577Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T09:33:41.8478504Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T09:33:41.8479771Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T09:33:41.8481064Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T09:33:41.8483421Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T09:33:41.8484687Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T09:33:41.8485975Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T09:33:41.8487908Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T09:33:41.8489103Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T09:33:41.8490379Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T09:33:41.8492102Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T09:33:41.8493372Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T09:33:41.8494591Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T09:33:41.8496638Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T09:33:41.8497781Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T09:33:41.8499061Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T09:33:41.8501393Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T09:33:41.8502766Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T09:33:41.8504059Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T09:33:41.8506025Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T09:33:41.8507252Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T09:33:41.8508525Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T09:33:41.8510492Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T09:33:41.8511624Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T09:33:41.8513059Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T09:33:41.8514753Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T09:33:41.8516443Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T09:33:41.8517599Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T09:33:41.8519606Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T09:33:41.8520718Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T09:33:41.8522044Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T09:33:41.8524805Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T09:33:41.8526138Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T09:33:41.8527448Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T09:33:41.8529169Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T09:33:41.8531085Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T09:33:41.8532312Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T09:33:41.8534527Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T09:33:41.8535723Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T09:33:41.8537019Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T09:33:41.8539271Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T09:33:41.8540462Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T09:33:41.8541780Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T09:33:41.8543669Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T09:33:41.8544936Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T09:33:41.8546207Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T09:33:41.8548465Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T09:33:41.8549636Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T09:33:41.8550972Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T09:33:41.8552903Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T09:33:41.8554071Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T09:33:41.8555378Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T09:33:41.8557375Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T09:33:41.8558956Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T09:33:41.8560206Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T09:33:41.8562030Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T09:33:41.8563233Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T09:33:41.8564544Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T09:33:41.8567503Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T09:33:41.8568538Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T09:33:41.8569811Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T09:33:41.8571959Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T09:33:41.8573119Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T09:33:41.8574355Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T09:33:41.8576410Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T09:33:41.8577633Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T09:33:41.8579014Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T09:33:41.8581019Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T09:33:41.8582249Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T09:33:41.8583626Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T09:33:41.8585955Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T09:33:41.8587058Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T09:33:41.8588287Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T09:33:41.8589937Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T09:33:41.8591785Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T09:33:41.8593148Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T09:33:41.8594909Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T09:33:41.8596084Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T09:33:41.8597542Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T09:33:41.8599211Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T09:33:41.8600412Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T09:33:41.8602034Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T09:33:41.8603935Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T09:33:41.8605200Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T09:33:41.8606513Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T09:33:41.8608322Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T09:33:41.8609507Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T09:33:41.8611001Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T09:33:41.8613205Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T09:33:41.8614439Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T09:33:41.8616057Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T09:33:41.8617259Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T09:33:41.8618508Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T09:33:41.8620432Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T09:33:41.8621586Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T09:33:41.8623042Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T09:33:41.8624644Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T09:33:41.8625812Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T09:33:41.8627159Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T09:33:41.8629319Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T09:33:41.8631064Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T09:33:41.8632792Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T09:33:41.8634839Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T09:33:41.8636139Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T09:33:41.8637635Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T09:33:41.8639402Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T09:33:41.8640596Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T09:33:41.8642064Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T09:33:41.8643996Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T09:33:41.8645269Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T09:33:41.8646540Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T09:33:41.8648391Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T09:33:41.8649548Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T09:33:41.8650835Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T09:33:41.8652674Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T09:33:41.8653900Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T09:33:41.8655190Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T09:33:41.8656887Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T09:33:41.8658110Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T09:33:41.8659683Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T09:33:41.8661196Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T09:33:41.8662515Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T09:33:41.8663806Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T09:33:41.8665434Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T09:33:41.8666933Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T09:33:41.8668102Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T09:33:41.8670003Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T09:33:41.8671134Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T09:33:41.8672463Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T09:33:41.8674033Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T09:33:41.8675312Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T09:33:41.8676899Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T09:33:41.8678500Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T09:33:41.8679683Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T09:33:41.8680964Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T09:33:41.8682841Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T09:33:41.8684080Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T09:33:41.8685395Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T09:33:41.8687995Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T09:33:41.8689668Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T09:33:41.8691049Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T09:33:41.8694266Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T09:33:41.8695075Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T09:33:41.8696094Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T09:33:41.8697689Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T09:33:41.8699005Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T09:33:41.8700317Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T09:33:41.8702134Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T09:33:41.8703502Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T09:33:41.8704700Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T09:33:41.8707153Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T09:33:41.8708741Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T09:33:41.8710180Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T09:33:41.8712188Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T09:33:41.8713476Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T09:33:41.8715402Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T09:33:41.8717030Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T09:33:41.8718412Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T09:33:41.8719801Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T09:33:41.8721617Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T09:33:41.8723128Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T09:33:41.8724413Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T09:33:41.8726183Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T09:33:41.8727624Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T09:33:41.8728906Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T09:33:41.8730567Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T09:33:41.8731905Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T09:33:41.8733193Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T09:33:41.8735073Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T09:33:41.8736260Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T09:33:41.8737440Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T09:33:41.8739594Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T09:33:41.8741025Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T09:33:41.8742275Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T09:33:41.8743869Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T09:33:41.8745152Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T09:33:41.8746402Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T09:33:41.8748711Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T09:33:41.8750093Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T09:33:41.8751490Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T09:33:41.8753402Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T09:33:41.8754731Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T09:33:41.8755994Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T09:33:41.8757847Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T09:33:41.8759236Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T09:33:41.8760623Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T09:33:41.8763014Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T09:33:41.8764338Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T09:33:41.8765626Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T09:33:41.8767509Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T09:33:41.8768975Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T09:33:41.8770260Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T09:33:41.8771996Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T09:33:41.8773281Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T09:33:41.8774514Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T09:33:41.8776392Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T09:33:41.8777723Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T09:33:41.8779047Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T09:33:41.8780904Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T09:33:41.8782090Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T09:33:41.8783358Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T09:33:41.8785382Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T09:33:41.8786815Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T09:33:41.8788153Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T09:33:41.8789793Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T09:33:41.8791234Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T09:33:41.8792873Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T09:33:41.8794736Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T09:33:41.8796091Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T09:33:41.8797373Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T09:33:41.8799174Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T09:33:41.8801224Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T09:33:41.8804930Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T09:33:41.8806736Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T09:33:41.8808089Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T09:33:41.8809482Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T09:33:41.8812677Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T09:33:41.8813990Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T09:33:41.8815305Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T09:33:41.8816894Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T09:33:41.8818136Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T09:33:41.8819440Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T09:33:41.8821213Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T09:33:41.8822606Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T09:33:41.8823868Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T09:33:41.8825759Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T09:33:41.8827147Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T09:33:41.8828419Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T09:33:41.8830240Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T09:33:41.8831568Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T09:33:41.8832857Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T09:33:41.8835087Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T09:33:41.8836560Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T09:33:41.8837735Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T09:33:41.8840007Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T09:33:41.8841420Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T09:33:41.8842836Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T09:33:41.8844843Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T09:33:41.8846154Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T09:33:41.8847439Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T09:33:41.8849210Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T09:33:41.8850456Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T09:33:41.8851756Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T09:33:41.8853366Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T09:33:41.8854641Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T09:33:41.8855938Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T09:33:41.8857953Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T09:33:41.8859182Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T09:33:41.8860484Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T09:33:41.8862852Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T09:33:41.8864163Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T09:33:41.8865496Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T09:33:41.8871486Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T09:33:41.8872884Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T09:33:41.8874206Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T09:33:41.8875984Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T09:33:41.8877194Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T09:33:41.8878393Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T09:33:41.8880246Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T09:33:41.8881535Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T09:33:41.8882914Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T09:33:41.8884898Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T09:33:41.8886199Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T09:33:41.8887449Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T09:33:41.8889168Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T09:33:41.8890526Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T09:33:41.8891771Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T09:33:41.8893589Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T09:33:41.8894810Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T09:33:41.8896011Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T09:33:41.8897918Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T09:33:41.8899245Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T09:33:41.8900511Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T09:33:41.8902597Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T09:33:41.8904340Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T09:33:41.8905633Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T09:33:41.8907443Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T09:33:41.8908775Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T09:33:41.8910027Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T09:33:41.8911799Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T09:33:41.8913075Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T09:33:41.8914360Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T09:33:41.8916186Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T09:33:41.8917480Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T09:33:41.8918760Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T09:33:41.8920640Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T09:33:41.8921974Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T09:33:41.8923428Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T09:33:41.8925285Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T09:33:41.8926603Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T09:33:41.8927837Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T09:33:41.8929524Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T09:33:41.8930700Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T09:33:41.8931952Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T09:33:41.8933977Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T09:33:41.8935319Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T09:33:41.8936577Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T09:33:41.8938516Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T09:33:41.8939795Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T09:33:41.8941248Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T09:33:41.8942889Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T09:33:41.8944278Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T09:33:41.8945589Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T09:33:41.8947336Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T09:33:41.8948610Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T09:33:41.8949883Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T09:33:41.8951741Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T09:33:41.8953055Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T09:33:41.8954357Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T09:33:41.8956135Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T09:33:41.8957521Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T09:33:41.8958793Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T09:33:41.8960579Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T09:33:41.8970904Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T09:33:41.8971569Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T09:33:41.8971868Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T09:33:41.8972177Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T09:33:41.8972453Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T09:33:41.8972728Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T09:33:41.8973037Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T09:33:41.8973313Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T09:33:41.8974862Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T09:33:41.8976139Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T09:33:41.8977536Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T09:33:41.8979501Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T09:33:41.8980799Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T09:33:41.8982390Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T09:33:41.8983539Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T09:33:41.8985229Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T09:33:41.8986467Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T09:33:41.8987754Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T09:33:41.8989385Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T09:33:41.8990738Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T09:33:41.8992046Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T09:33:41.8994223Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T09:33:41.8995373Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T09:33:41.8996618Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T09:33:41.8998709Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T09:33:41.8999997Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T09:33:41.9001288Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T09:33:41.9003379Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T09:33:41.9004508Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T09:33:41.9006256Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T09:33:41.9007944Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T09:33:41.9009364Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T09:33:41.9010590Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T09:33:41.9012256Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T09:33:41.9013487Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T09:33:41.9014733Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T09:33:41.9016510Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T09:33:41.9017766Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T09:33:41.9019079Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T09:33:41.9020759Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T09:33:41.9022038Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T09:33:41.9023307Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T09:33:41.9025471Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T09:33:41.9026797Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T09:33:41.9028015Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T09:33:41.9029890Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T09:33:41.9031484Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T09:33:41.9032755Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T09:33:41.9034500Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T09:33:41.9035728Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T09:33:41.9037051Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T09:33:41.9039116Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T09:33:41.9040432Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T09:33:41.9041728Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T09:33:41.9043937Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T09:33:41.9045195Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T09:33:41.9046550Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T09:33:41.9048633Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T09:33:41.9049955Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T09:33:41.9051217Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T09:33:41.9052921Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T09:33:41.9054180Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T09:33:41.9055468Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T09:33:41.9057189Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T09:33:41.9058496Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T09:33:41.9059801Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T09:33:41.9061599Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T09:33:41.9062854Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T09:33:41.9064142Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T09:33:41.9065706Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T09:33:41.9067001Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T09:33:41.9068227Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T09:33:41.9069820Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T09:33:41.9071088Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T09:33:41.9072395Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T09:33:41.9074088Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T09:33:41.9075390Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T09:33:41.9076614Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T09:33:41.9078423Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T09:33:41.9079673Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T09:33:41.9080913Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T09:33:41.9082691Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T09:33:41.9084128Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T09:33:41.9085802Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T09:33:41.9087028Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T09:33:41.9088348Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T09:33:41.9090091Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T09:33:41.9091364Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T09:33:41.9092655Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T09:33:41.9094433Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T09:33:41.9095734Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T09:33:41.9097073Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T09:33:41.9098778Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T09:33:41.9100050Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T09:33:41.9101331Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T09:33:41.9103154Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T09:33:41.9104384Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T09:33:41.9105648Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T09:33:41.9107850Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T09:33:41.9109306Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T09:33:41.9110781Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T09:33:41.9112545Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T09:33:41.9114142Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T09:33:41.9115726Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T09:33:41.9117447Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T09:33:41.9118762Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T09:33:41.9120089Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T09:33:41.9122080Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T09:33:41.9123512Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T09:33:41.9124786Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T09:33:41.9126465Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T09:33:41.9127803Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T09:33:41.9129160Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T09:33:41.9130859Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T09:33:41.9132121Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T09:33:41.9133350Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T09:33:41.9135038Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T09:33:41.9136350Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T09:33:41.9138006Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T09:33:41.9139667Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T09:33:41.9140935Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T09:33:41.9142352Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T09:33:41.9143997Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T09:33:41.9145276Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T09:33:41.9146582Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T09:33:41.9148251Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T09:33:41.9149674Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T09:33:41.9150806Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T09:33:41.9152890Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T09:33:41.9154119Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T09:33:41.9155308Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T09:33:41.9157030Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T09:33:41.9158274Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T09:33:41.9159577Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T09:33:41.9161516Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T09:33:41.9163119Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T09:33:41.9164372Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T09:33:41.9165994Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T09:33:41.9167145Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T09:33:41.9168435Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T09:33:41.9170207Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T09:33:41.9171972Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T09:33:41.9173360Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T09:33:41.9175045Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T09:33:41.9176317Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T09:33:41.9177545Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T09:33:41.9179236Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T09:33:41.9180618Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T09:33:41.9181869Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T09:33:41.9183549Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T09:33:41.9184905Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T09:33:41.9186222Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T09:33:41.9188009Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T09:33:41.9189644Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T09:33:41.9190899Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T09:33:41.9192862Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T09:33:41.9194160Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T09:33:41.9195403Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T09:33:41.9197235Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T09:33:41.9198556Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T09:33:41.9199789Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T09:33:41.9201491Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T09:33:41.9206088Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T09:33:41.9207279Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T09:33:41.9208776Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T09:33:41.9210051Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T09:33:41.9211329Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T09:33:41.9212835Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T09:33:41.9214612Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T09:33:41.9215883Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T09:33:41.9217421Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T09:33:41.9218865Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T09:33:41.9220120Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T09:33:41.9221659Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T09:33:41.9222891Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T09:33:41.9224175Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T09:33:41.9226259Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T09:33:41.9227564Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T09:33:41.9228827Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T09:33:41.9230638Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T09:33:41.9231990Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T09:33:41.9233179Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T09:33:41.9235467Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T09:33:41.9236709Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T09:33:41.9238428Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T09:33:41.9239672Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T09:33:41.9242425Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T09:33:41.9244209Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T09:33:41.9245556Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T09:33:41.9247236Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T09:33:41.9248545Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T09:33:41.9249891Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T09:33:41.9252138Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T09:33:41.9253397Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T09:33:41.9254943Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T09:33:41.9256104Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T09:33:41.9258167Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T09:33:41.9259485Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T09:33:41.9261337Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T09:33:41.9262635Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T09:33:41.9263966Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T09:33:41.9265615Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T09:33:41.9266864Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T09:33:41.9268294Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T09:33:41.9269816Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T09:33:41.9271008Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T09:33:41.9272796Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T09:33:41.9274018Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T09:33:41.9275307Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T09:33:41.9277312Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T09:33:41.9278758Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T09:33:41.9280087Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T09:33:41.9281641Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T09:33:41.9283104Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T09:33:41.9284385Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T09:33:41.9286556Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T09:33:41.9287828Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T09:33:41.9289076Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T09:33:41.9290790Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T09:33:41.9292062Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T09:33:41.9293352Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T09:33:41.9295064Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T09:33:41.9296353Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T09:33:41.9297595Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T09:33:41.9299223Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T09:33:41.9300464Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T09:33:41.9302004Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T09:33:41.9303716Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T09:33:41.9304971Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T09:33:41.9306251Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T09:33:41.9307801Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T09:33:41.9309089Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T09:33:41.9310339Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T09:33:41.9312633Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T09:33:41.9313841Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T09:33:41.9315073Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T09:33:41.9317158Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T09:33:41.9318544Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T09:33:41.9319831Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T09:33:41.9321529Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T09:33:41.9322901Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T09:33:41.9324204Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T09:33:41.9325922Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T09:33:41.9327184Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T09:33:41.9328412Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T09:33:41.9329969Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T09:33:41.9331215Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T09:33:41.9332503Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T09:33:41.9334690Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T09:33:41.9335919Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T09:33:41.9337699Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T09:33:41.9339012Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T09:33:41.9340234Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T09:33:41.9342417Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T09:33:41.9343724Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T09:33:41.9345091Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T09:33:41.9346879Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T09:33:41.9348079Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T09:33:41.9349316Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T09:33:41.9351114Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T09:33:41.9352568Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T09:33:41.9353805Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T09:33:41.9355780Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T09:33:41.9357045Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T09:33:41.9358322Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T09:33:41.9360035Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T09:33:41.9361364Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T09:33:41.9363829Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T09:33:41.9365246Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T09:33:41.9366495Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T09:33:41.9368897Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T09:33:41.9370169Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T09:33:41.9371598Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T09:33:41.9373337Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T09:33:41.9374659Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T09:33:41.9375901Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T09:33:41.9377476Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T09:33:41.9378787Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T09:33:41.9380103Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T09:33:41.9382005Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T09:33:41.9383231Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T09:33:41.9384511Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T09:33:41.9386264Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T09:33:41.9387662Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T09:33:41.9389036Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T09:33:41.9390892Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T09:33:41.9392155Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T09:33:41.9393376Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T09:33:41.9395701Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T09:33:41.9397535Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T09:33:41.9399279Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T09:33:41.9400967Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T09:33:41.9402800Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T09:33:41.9404358Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T09:33:41.9405608Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T09:33:41.9407168Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T09:33:41.9408429Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T09:33:41.9409938Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T09:33:41.9411176Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T09:33:41.9413112Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T09:33:41.9414301Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T09:33:41.9415920Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T09:33:41.9417213Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T09:33:41.9418712Z * [new branch] google-main -> origin/google-main 2025-12-04T09:33:41.9420580Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T09:33:41.9421729Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T09:33:41.9423844Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T09:33:41.9425436Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T09:33:41.9427337Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T09:33:41.9428484Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T09:33:41.9429711Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T09:33:41.9431587Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T09:33:41.9433295Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T09:33:41.9435428Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T09:33:41.9436248Z * [new branch] inlining -> origin/inlining 2025-12-04T09:33:41.9437732Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T09:33:41.9439063Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T09:33:41.9440707Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T09:33:41.9442092Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T09:33:41.9443667Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T09:33:41.9445214Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T09:33:41.9446796Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T09:33:41.9448022Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T09:33:41.9450032Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T09:33:41.9451277Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T09:33:41.9452968Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T09:33:41.9454255Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T09:33:41.9455592Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T09:33:41.9457004Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T09:33:41.9458396Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T09:33:41.9459738Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T09:33:41.9461136Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T09:33:41.9462436Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T09:33:41.9463969Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T09:33:41.9465216Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T09:33:41.9466649Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T09:33:41.9468057Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T09:33:41.9469908Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T09:33:41.9471647Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T09:33:41.9472880Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T09:33:41.9474228Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T09:33:41.9476044Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T09:33:41.9477783Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T09:33:41.9479410Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T09:33:41.9480873Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T09:33:41.9482081Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T09:33:41.9483483Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T09:33:41.9485691Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T09:33:41.9487578Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T09:33:41.9488752Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T09:33:41.9490005Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T09:33:41.9491529Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T09:33:41.9492771Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T09:33:41.9494121Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T09:33:41.9495620Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T09:33:41.9497330Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T09:33:41.9498460Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T09:33:41.9499868Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T09:33:41.9501315Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T09:33:41.9502736Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T09:33:41.9503993Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T09:33:41.9505264Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T09:33:41.9506573Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T09:33:41.9507937Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T09:33:41.9509136Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T09:33:41.9510828Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T09:33:41.9512208Z * [new branch] main -> origin/main 2025-12-04T09:33:41.9513720Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T09:33:41.9515256Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T09:33:41.9516656Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T09:33:41.9518141Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T09:33:41.9519518Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T09:33:41.9520977Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T09:33:41.9522478Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T09:33:41.9523937Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T09:33:41.9525677Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T09:33:41.9527757Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T09:33:41.9528954Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T09:33:41.9530448Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T09:33:41.9532254Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T09:33:41.9534141Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T09:33:41.9535144Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T09:33:41.9537363Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T09:33:41.9538828Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T09:33:41.9540175Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T09:33:41.9541719Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T09:33:41.9543075Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T09:33:41.9544387Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T09:33:41.9546236Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T09:33:41.9547468Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T09:33:41.9548749Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T09:33:41.9549972Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T09:33:41.9551252Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T09:33:41.9552551Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T09:33:41.9553719Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T09:33:41.9554912Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T09:33:41.9555993Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T09:33:41.9557527Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T09:33:41.9559001Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T09:33:41.9560253Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T09:33:41.9561694Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T09:33:41.9563141Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T09:33:41.9564569Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T09:33:41.9565912Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T09:33:41.9567276Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T09:33:41.9568634Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T09:33:41.9569868Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T09:33:41.9571098Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T09:33:41.9572376Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T09:33:41.9573656Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T09:33:41.9574911Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T09:33:41.9576195Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T09:33:41.9577552Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T09:33:41.9578812Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T09:33:41.9580130Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T09:33:41.9581371Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T09:33:41.9582683Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T09:33:41.9583967Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T09:33:41.9585312Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T09:33:41.9586577Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T09:33:41.9587859Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T09:33:41.9589131Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T09:33:41.9590483Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T09:33:41.9592337Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T09:33:41.9593617Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T09:33:41.9595010Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T09:33:41.9596302Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T09:33:41.9597607Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T09:33:41.9598881Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T09:33:41.9600283Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T09:33:41.9604578Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T09:33:41.9606064Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T09:33:41.9607428Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T09:33:41.9608668Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T09:33:41.9609934Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T09:33:41.9611238Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T09:33:41.9612436Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T09:33:41.9613715Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T09:33:41.9615478Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T09:33:41.9616767Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T09:33:41.9618005Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T09:33:41.9619332Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T09:33:41.9620622Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T09:33:41.9621772Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T09:33:41.9623138Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T09:33:41.9624321Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T09:33:41.9625392Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T09:33:41.9626661Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T09:33:41.9628127Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T09:33:41.9630021Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T09:33:41.9631330Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T09:33:41.9632566Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T09:33:41.9633895Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T09:33:41.9635149Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T09:33:41.9636336Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T09:33:41.9637615Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T09:33:41.9638904Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T09:33:41.9640152Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T09:33:41.9641478Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T09:33:41.9642901Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T09:33:41.9644192Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T09:33:41.9645530Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T09:33:41.9646894Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T09:33:41.9648313Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T09:33:41.9649354Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T09:33:41.9650670Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T09:33:41.9651890Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T09:33:41.9653201Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T09:33:41.9654469Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T09:33:41.9655777Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T09:33:41.9657101Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T09:33:41.9658351Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T09:33:41.9659735Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T09:33:41.9660983Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T09:33:41.9662258Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T09:33:41.9663578Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T09:33:41.9664821Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T09:33:41.9666212Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T09:33:41.9667521Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T09:33:41.9668930Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T09:33:41.9670184Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T09:33:41.9671517Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T09:33:41.9672879Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T09:33:41.9674018Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T09:33:41.9675335Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T09:33:41.9676585Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T09:33:41.9678042Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T09:33:41.9679351Z * [new branch] module-shim -> origin/module-shim 2025-12-04T09:33:41.9680729Z * [new branch] move_config -> origin/move_config 2025-12-04T09:33:41.9682671Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T09:33:41.9684488Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T09:33:41.9686299Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T09:33:41.9687611Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T09:33:41.9688995Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T09:33:41.9690320Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T09:33:41.9691677Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T09:33:41.9693441Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T09:33:41.9694630Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T09:33:41.9695934Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T09:33:41.9697121Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T09:33:41.9698385Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T09:33:41.9699494Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T09:33:41.9700704Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T09:33:41.9702113Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T09:33:41.9703540Z * [new branch] nightly -> origin/nightly 2025-12-04T09:33:41.9705765Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T09:33:41.9707208Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T09:33:41.9708430Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T09:33:41.9709920Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T09:33:41.9711454Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T09:33:41.9713146Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T09:33:41.9714487Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T09:33:41.9716150Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T09:33:41.9717391Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T09:33:41.9718756Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T09:33:41.9720078Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T09:33:41.9721915Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T09:33:41.9723445Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T09:33:41.9724846Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T09:33:41.9727208Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T09:33:41.9728565Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T09:33:41.9729903Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T09:33:41.9731439Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T09:33:41.9732834Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T09:33:41.9734338Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T09:33:41.9735718Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T09:33:41.9737071Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T09:33:41.9738343Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T09:33:41.9739634Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T09:33:41.9741047Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T09:33:41.9742237Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T09:33:41.9743504Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T09:33:41.9745210Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T09:33:41.9746982Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T09:33:41.9748568Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T09:33:41.9750439Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T09:33:41.9751732Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T09:33:41.9754752Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T09:33:41.9755898Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T09:33:41.9758048Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T09:33:41.9759497Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T09:33:41.9760947Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T09:33:41.9762467Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T09:33:41.9763986Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T09:33:41.9765366Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T09:33:41.9766902Z * [new branch] pca2 -> origin/pca2 2025-12-04T09:33:41.9768399Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T09:33:41.9769881Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T09:33:41.9771382Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T09:33:41.9772869Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T09:33:41.9774661Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T09:33:41.9775984Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T09:33:41.9777168Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T09:33:41.9778304Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T09:33:41.9779523Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T09:33:41.9781025Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T09:33:41.9782588Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T09:33:41.9784045Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T09:33:41.9785266Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T09:33:41.9786548Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T09:33:41.9788085Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T09:33:41.9789362Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T09:33:41.9790614Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T09:33:41.9791931Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T09:33:41.9793313Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T09:33:41.9794477Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T09:33:41.9795720Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T09:33:41.9797080Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T09:33:41.9798326Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T09:33:41.9799551Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T09:33:41.9801115Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T09:33:41.9802597Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T09:33:41.9803958Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T09:33:41.9805411Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T09:33:41.9806605Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T09:33:41.9807846Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T09:33:41.9809404Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T09:33:41.9810725Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T09:33:41.9812168Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T09:33:41.9813540Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T09:33:41.9814795Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T09:33:41.9816075Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T09:33:41.9817446Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T09:33:41.9818790Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T09:33:41.9820078Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T09:33:41.9821257Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T09:33:41.9822566Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T09:33:41.9823860Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T09:33:41.9825020Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T09:33:41.9826285Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T09:33:41.9827740Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T09:33:41.9828941Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T09:33:41.9830213Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T09:33:41.9831552Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T09:33:41.9832891Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T09:33:41.9834172Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T09:33:41.9835332Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T09:33:41.9836539Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T09:33:41.9838368Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T09:33:41.9839530Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T09:33:41.9842698Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T09:33:41.9843094Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T09:33:41.9844877Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T09:33:41.9846086Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T09:33:41.9847420Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T09:33:41.9848836Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T09:33:41.9850952Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T09:33:41.9853092Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T09:33:41.9854171Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T09:33:41.9856761Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T09:33:41.9858303Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T09:33:41.9859996Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T09:33:41.9861855Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T09:33:41.9863150Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T09:33:41.9864492Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T09:33:41.9865828Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T09:33:41.9867029Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T09:33:41.9868100Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T09:33:41.9869383Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T09:33:41.9870853Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T09:33:41.9872220Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T09:33:41.9873643Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T09:33:41.9874916Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T09:33:41.9876266Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T09:33:41.9877659Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T09:33:41.9879007Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T09:33:41.9880681Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T09:33:41.9882973Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T09:33:41.9884826Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T09:33:41.9886204Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T09:33:41.9887591Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T09:33:41.9889176Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T09:33:41.9890568Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T09:33:41.9891983Z * [new branch] release_notes -> origin/release_notes 2025-12-04T09:33:41.9893405Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T09:33:41.9895165Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T09:33:41.9896374Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T09:33:41.9897381Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T09:33:41.9898803Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T09:33:41.9901579Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T09:33:41.9904647Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T09:33:41.9907150Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T09:33:41.9909742Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T09:33:41.9911484Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T09:33:41.9912655Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T09:33:41.9914029Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T09:33:41.9915282Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T09:33:41.9917251Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T09:33:41.9918323Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T09:33:41.9919689Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T09:33:41.9920901Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T09:33:41.9922455Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T09:33:41.9924208Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T09:33:41.9926340Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T09:33:41.9927297Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T09:33:41.9929166Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T09:33:41.9930377Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T09:33:41.9932074Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T09:33:41.9933774Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T09:33:41.9935284Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T09:33:41.9937565Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T09:33:41.9938762Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T09:33:41.9940283Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T09:33:41.9941490Z * [new branch] save -> origin/save 2025-12-04T09:33:41.9942879Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T09:33:41.9944213Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T09:33:41.9946525Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T09:33:41.9947993Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T09:33:41.9949625Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T09:33:41.9951082Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T09:33:41.9952417Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T09:33:41.9953764Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T09:33:41.9955521Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T09:33:41.9956949Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T09:33:41.9958301Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T09:33:41.9959647Z * [new branch] suo -> origin/suo 2025-12-04T09:33:41.9961033Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T09:33:41.9962507Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T09:33:41.9964007Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T09:33:41.9965327Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T09:33:41.9966821Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T09:33:41.9968291Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T09:33:41.9969650Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T09:33:41.9970990Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T09:33:41.9972311Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T09:33:41.9973740Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T09:33:41.9975094Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T09:33:41.9976485Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T09:33:41.9977809Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T09:33:41.9979179Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T09:33:41.9980549Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T09:33:41.9981853Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T09:33:41.9983211Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T09:33:41.9984635Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T09:33:41.9986018Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T09:33:41.9987498Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T09:33:41.9988895Z * [new branch] test-old -> origin/test-old 2025-12-04T09:33:41.9990656Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T09:33:41.9992500Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T09:33:41.9993847Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T09:33:41.9994986Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T09:33:41.9996388Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T09:33:41.9997954Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T09:33:41.9999577Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T09:33:42.0000938Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T09:33:42.0005289Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T09:33:42.0006533Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T09:33:42.0007769Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T09:33:42.0009103Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T09:33:42.0010399Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T09:33:42.0011663Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T09:33:42.0013080Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T09:33:42.0014375Z * [new branch] tmp -> origin/tmp 2025-12-04T09:33:42.0015778Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T09:33:42.0017209Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T09:33:42.0018738Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T09:33:42.0019941Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T09:33:42.0021359Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T09:33:42.0022714Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T09:33:42.0024073Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T09:33:42.0025915Z * [new branch] type_dec -> origin/type_dec 2025-12-04T09:33:42.0027394Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T09:33:42.0029398Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T09:33:42.0030643Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T09:33:42.0031926Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T09:33:42.0033262Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T09:33:42.0034376Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T09:33:42.0035954Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T09:33:42.0037801Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T09:33:42.0039531Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T09:33:42.0040768Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T09:33:42.0041863Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T09:33:42.0043453Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T09:33:42.0044560Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T09:33:42.0046498Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T09:33:42.0047823Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T09:33:42.0049726Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T09:33:42.0051054Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T09:33:42.0052221Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T09:33:42.0053676Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T09:33:42.0054958Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T09:33:42.0056341Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T09:33:42.0058183Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T09:33:42.0059536Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T09:33:42.0060977Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T09:33:42.0062246Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T09:33:42.0063696Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T09:33:42.0065157Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T09:33:42.0066503Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T09:33:42.0067902Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T09:33:42.0069406Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T09:33:42.0070870Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T09:33:42.0072465Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T09:33:42.0074053Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T09:33:42.0075461Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T09:33:42.0076924Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T09:33:42.0078365Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T09:33:42.0079880Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T09:33:42.0081368Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T09:33:42.0082787Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T09:33:42.0084214Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T09:33:42.0085527Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T09:33:42.0087130Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T09:33:42.0089139Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T09:33:42.0090306Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T09:33:42.0091738Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T09:33:42.0093288Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T09:33:42.0094798Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T09:33:42.0096547Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T09:33:42.0098251Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T09:33:42.0099521Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T09:33:42.0101040Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T09:33:42.0102386Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T09:33:42.0103485Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T09:33:42.0105196Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T09:33:42.0106601Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T09:33:42.0107930Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T09:33:42.0109261Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T09:33:42.0110995Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T09:33:42.0112265Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T09:33:42.0113837Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T09:33:42.0114532Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T09:33:42.0115863Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T09:33:42.0117018Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T09:33:42.0118218Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T09:33:42.0119686Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T09:33:42.0121304Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T09:33:42.0122719Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T09:33:42.0124009Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T09:33:42.0125141Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T09:33:42.0126382Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T09:33:42.0127701Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T09:33:42.0128930Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T09:33:42.0130183Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T09:33:42.0131547Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T09:33:42.0132701Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T09:33:42.0134054Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T09:33:42.0135261Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T09:33:42.0136580Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T09:33:42.0138318Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T09:33:42.0139798Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:33:42.0141111Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T09:33:42.0141955Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T09:33:42.0143380Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T09:33:42.0144676Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T09:33:42.0146566Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T09:33:42.0147735Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T09:33:42.0148964Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T09:33:42.0150619Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T09:33:42.0152094Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T09:33:42.0153758Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T09:33:42.0155517Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T09:33:42.0156961Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T09:33:42.0158178Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T09:33:42.0159389Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T09:33:42.0161002Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T09:33:42.0162272Z * [new branch] zb2p -> origin/zb2p 2025-12-04T09:33:42.0163815Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T09:33:42.0166369Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T09:33:42.0167678Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T09:33:42.0168813Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T09:33:42.0170699Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T09:33:42.0172418Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T09:33:42.0173583Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T09:33:42.0174938Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T09:33:42.0176326Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T09:33:42.0177639Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T09:33:42.0179312Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T09:33:42.0180594Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T09:33:42.0181992Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T09:33:42.0183432Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T09:33:42.0184880Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T09:33:42.0186639Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T09:33:42.0188539Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T09:33:42.0189796Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T09:33:42.0191116Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T09:33:42.0192365Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T09:33:42.0193632Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T09:33:42.0195065Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T09:33:42.0196412Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T09:33:42.0197246Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T09:33:42.0198491Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T09:33:42.0199235Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T09:33:42.0200088Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T09:33:42.0201004Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T09:33:42.0202370Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T09:33:42.0203136Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T09:33:42.0204198Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T09:33:42.0205166Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T09:33:42.0206023Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T09:33:42.0206870Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T09:33:42.0207925Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T09:33:42.0209338Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T09:33:42.0210747Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T09:33:42.0211605Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T09:33:42.0212711Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T09:33:42.0213463Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T09:33:42.0214709Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T09:33:42.0215442Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T09:33:42.0216713Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T09:33:42.0217450Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T09:33:42.0218725Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T09:33:42.0219555Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T09:33:42.0220369Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T09:33:42.0221955Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T09:33:42.0222573Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T09:33:42.0223438Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T09:33:42.0224459Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T09:33:42.0225538Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T09:33:42.0226301Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T09:33:42.0227453Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T09:33:42.0229026Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T09:33:42.0229820Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T09:33:42.0230591Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T09:33:42.0231885Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T09:33:42.0232668Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T09:33:42.0233489Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T09:33:42.0234321Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T09:33:42.0235386Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T09:33:42.0236076Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T09:33:42.0237235Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T09:33:42.0237930Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T09:33:42.0239541Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T09:33:42.0240283Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T09:33:42.0240660Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T09:33:42.0241598Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T09:33:42.0242294Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T09:33:42.0243375Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T09:33:42.0243990Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T09:33:42.0244776Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T09:33:42.0245624Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T09:33:42.0247171Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T09:33:42.0247955Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T09:33:42.0249089Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T09:33:42.0249850Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T09:33:42.0250717Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T09:33:42.0251668Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T09:33:42.0252451Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T09:33:42.0253279Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T09:33:42.0254460Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T09:33:42.0255870Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T09:33:42.0256674Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T09:33:42.0257538Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T09:33:42.0258473Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T09:33:42.0260087Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T09:33:42.0261371Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T09:33:42.0262312Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T09:33:42.0263387Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T09:33:42.0264294Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T09:33:42.0265186Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T09:33:42.0265977Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T09:33:42.0266946Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T09:33:42.0268211Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T09:33:42.0269065Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T09:33:42.0269865Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T09:33:42.0270810Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T09:33:42.0271625Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T09:33:42.0272838Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T09:33:42.0273477Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T09:33:42.0274289Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T09:33:42.0275079Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T09:33:42.0276276Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T09:33:42.0277010Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T09:33:42.0278109Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T09:33:42.0278964Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T09:33:42.0279747Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T09:33:42.0280580Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T09:33:42.0281590Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T09:33:42.0282368Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T09:33:42.0283331Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T09:33:42.0284159Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T09:33:42.0285196Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T09:33:42.0285867Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T09:33:42.0286698Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T09:33:42.0287505Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T09:33:42.0288715Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T09:33:42.0289900Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T09:33:42.0290916Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T09:33:42.0291707Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T09:33:42.0292550Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T09:33:42.0293408Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T09:33:42.0294264Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T09:33:42.0295109Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T09:33:42.0296375Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T09:33:42.0297117Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T09:33:42.0297978Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T09:33:42.0298817Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T09:33:42.0300016Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T09:33:42.0300728Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T09:33:42.0301893Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T09:33:42.0302646Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T09:33:42.0303615Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T09:33:42.0304393Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T09:33:42.0305371Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T09:33:42.0306462Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T09:33:42.0307255Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T09:33:42.0308079Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T09:33:42.0308947Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T09:33:42.0309798Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T09:33:42.0310926Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T09:33:42.0311675Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T09:33:42.0312525Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T09:33:42.0313620Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T09:33:42.0314478Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T09:33:42.0315340Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T09:33:42.0316420Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T09:33:42.0317198Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T09:33:42.0318067Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T09:33:42.0318914Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T09:33:42.0319787Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T09:33:42.0320619Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T09:33:42.0321496Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T09:33:42.0322425Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T09:33:42.0323678Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T09:33:42.0324505Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T09:33:42.0325357Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T09:33:42.0326216Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T09:33:42.0327061Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T09:33:42.0328543Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T09:33:42.0329292Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T09:33:42.0330149Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T09:33:42.0331022Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T09:33:42.0331867Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T09:33:42.0332846Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T09:33:42.0333598Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T09:33:42.0334466Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T09:33:42.0335301Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T09:33:42.0336168Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T09:33:42.0337018Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T09:33:42.0337853Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T09:33:42.0338789Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T09:33:42.0339562Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T09:33:42.0340410Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T09:33:42.0341276Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T09:33:42.0342261Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T09:33:42.0343283Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T09:33:42.0344315Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T09:33:42.0345106Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T09:33:42.0345963Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T09:33:42.0346831Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T09:33:42.0347693Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T09:33:42.0348554Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T09:33:42.0349417Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T09:33:42.0350397Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T09:33:42.0351207Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T09:33:42.0352061Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T09:33:42.0353101Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T09:33:42.0353761Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T09:33:42.0354637Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T09:33:42.0355797Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T09:33:42.0356554Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T09:33:42.0357446Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T09:33:42.0358295Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T09:33:42.0359155Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T09:33:42.0360159Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T09:33:42.0360919Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T09:33:42.0361787Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T09:33:42.0362737Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T09:33:42.0363746Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T09:33:42.0364483Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T09:33:42.0365332Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T09:33:42.0366382Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T09:33:42.0367097Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T09:33:42.0368185Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T09:33:42.0368945Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T09:33:42.0370096Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T09:33:42.0371207Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T09:33:42.0371986Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T09:33:42.0372994Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T09:33:42.0373765Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T09:33:42.0374638Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T09:33:42.0375475Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T09:33:42.0376326Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T09:33:42.0377462Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T09:33:42.0378200Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T09:33:42.0379076Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T09:33:42.0380180Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T09:33:42.0380968Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T09:33:42.0381809Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T09:33:42.0382692Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T09:33:42.0383834Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T09:33:42.0385152Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T09:33:42.0386503Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T09:33:42.0387271Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T09:33:42.0388151Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T09:33:42.0389025Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T09:33:42.0389897Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T09:33:42.0391123Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T09:33:42.0391889Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T09:33:42.0392924Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T09:33:42.0393688Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T09:33:42.0394557Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T09:33:42.0395697Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T09:33:42.0396696Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T09:33:42.0397507Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T09:33:42.0398366Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T09:33:42.0399675Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T09:33:42.0400450Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T09:33:42.0404492Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T09:33:42.0405643Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T09:33:42.0406481Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T09:33:42.0407673Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T09:33:42.0408438Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T09:33:42.0409510Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T09:33:42.0410198Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T09:33:42.0411424Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T09:33:42.0412212Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T09:33:42.0413056Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T09:33:42.0414165Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T09:33:42.0414948Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T09:33:42.0415815Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T09:33:42.0416798Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T09:33:42.0417753Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T09:33:42.0418592Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T09:33:42.0419659Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T09:33:42.0420472Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T09:33:42.0421549Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T09:33:42.0422633Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T09:33:42.0423393Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T09:33:42.0424344Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T09:33:42.0425145Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T09:33:42.0426101Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T09:33:42.0426895Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T09:33:42.0427748Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T09:33:42.0429295Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T09:33:42.0430825Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T09:33:42.0431615Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T09:33:42.0432647Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T09:33:42.0433432Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T09:33:42.0434444Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T09:33:42.0435188Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T09:33:42.0436240Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T09:33:42.0437068Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T09:33:42.0438050Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T09:33:42.0438825Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T09:33:42.0439683Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T09:33:42.0440643Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T09:33:42.0441684Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T09:33:42.0442544Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T09:33:42.0443593Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T09:33:42.0444456Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T09:33:42.0445289Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T09:33:42.0446160Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T09:33:42.0447062Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T09:33:42.0447939Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T09:33:42.0448814Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T09:33:42.0450271Z * [new tag] ciflow/inductor/169557 -> ciflow/inductor/169557 2025-12-04T09:33:42.0451387Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T09:33:42.0452491Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T09:33:42.0453532Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T09:33:42.0454586Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T09:33:42.0455326Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T09:33:42.0456171Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T09:33:42.0456968Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T09:33:42.0458019Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T09:33:42.0458766Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T09:33:42.0459868Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T09:33:42.0460591Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T09:33:42.0461743Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T09:33:42.0462506Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T09:33:42.0463560Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T09:33:42.0464512Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T09:33:42.0465478Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T09:33:42.0466271Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T09:33:42.0467150Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T09:33:42.0468113Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T09:33:42.0468924Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T09:33:42.0469741Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T09:33:42.0471023Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T09:33:42.0471782Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T09:33:42.0472761Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T09:33:42.0473515Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T09:33:42.0474399Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T09:33:42.0475545Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T09:33:42.0476523Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T09:33:42.0477470Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T09:33:42.0478592Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T09:33:42.0480243Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:33:42.0481249Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T09:33:42.0482660Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T09:33:42.0483777Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T09:33:42.0484697Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T09:33:42.0485872Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T09:33:42.0487080Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T09:33:42.0488295Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T09:33:42.0489331Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T09:33:42.0490752Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T09:33:42.0491499Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T09:33:42.0492344Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T09:33:42.0493135Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T09:33:42.0494227Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T09:33:42.0494935Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T09:33:42.0496037Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T09:33:42.0496770Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T09:33:42.0497837Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T09:33:42.0498864Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T09:33:42.0499580Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T09:33:42.0500394Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T09:33:42.0501362Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T09:33:42.0502514Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T09:33:42.0503302Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T09:33:42.0504108Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T09:33:42.0505234Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T09:33:42.0505909Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T09:33:42.0506979Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T09:33:42.0507722Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T09:33:42.0508524Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T09:33:42.0509369Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T09:33:42.0510147Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T09:33:42.0510990Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T09:33:42.0511904Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T09:33:42.0512631Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T09:33:42.0513449Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T09:33:42.0514424Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T09:33:42.0515185Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T09:33:42.0516418Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T09:33:42.0517520Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T09:33:42.0518570Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T09:33:42.0519343Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T09:33:42.0520343Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T09:33:42.0521088Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T09:33:42.0521947Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T09:33:42.0523021Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T09:33:42.0523824Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T09:33:42.0524644Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T09:33:42.0525604Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T09:33:42.0526391Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T09:33:42.0527341Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T09:33:42.0528122Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T09:33:42.0528962Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T09:33:42.0529812Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T09:33:42.0531172Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T09:33:42.0532109Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T09:33:42.0533704Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T09:33:42.0534170Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T09:33:42.0535024Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T09:33:42.0535803Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T09:33:42.0536903Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T09:33:42.0538050Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T09:33:42.0539350Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T09:33:42.0540624Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T09:33:42.0541801Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T09:33:42.0542886Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T09:33:42.0543937Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T09:33:42.0545041Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T09:33:42.0546698Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T09:33:42.0547242Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T09:33:42.0548542Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T09:33:42.0549276Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T09:33:42.0550428Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T09:33:42.0551497Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T09:33:42.0553135Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T09:33:42.0553713Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T09:33:42.0554787Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T09:33:42.0555543Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T09:33:42.0556352Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T09:33:42.0557167Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T09:33:42.0557974Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T09:33:42.0558795Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T09:33:42.0559636Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T09:33:42.0560428Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T09:33:42.0561279Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T09:33:42.0562090Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T09:33:42.0563675Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T09:33:42.0564991Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T09:33:42.0566213Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T09:33:42.0567299Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T09:33:42.0568100Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T09:33:42.0569055Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T09:33:42.0569846Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T09:33:42.0570976Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T09:33:42.0571763Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T09:33:42.0572615Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T09:33:42.0573584Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T09:33:42.0574583Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T09:33:42.0575516Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T09:33:42.0576350Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T09:33:42.0577317Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T09:33:42.0578340Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T09:33:42.0579123Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T09:33:42.0580134Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T09:33:42.0581197Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T09:33:42.0582007Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T09:33:42.0582992Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T09:33:42.0583820Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T09:33:42.0584626Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T09:33:42.0585632Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T09:33:42.0586378Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T09:33:42.0587234Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T09:33:42.0588090Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T09:33:42.0588964Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T09:33:42.0590079Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T09:33:42.0591092Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T09:33:42.0591884Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T09:33:42.0592746Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T09:33:42.0593864Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T09:33:42.0594652Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T09:33:42.0595637Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T09:33:42.0596393Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T09:33:42.0597514Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T09:33:42.0598271Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T09:33:42.0599223Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T09:33:42.0600320Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T09:33:42.0601217Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T09:33:42.0602292Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T09:33:42.0603251Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T09:33:42.0604232Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T09:33:42.0605009Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T09:33:42.0605866Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T09:33:42.0606800Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T09:33:42.0607626Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T09:33:42.0608761Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T09:33:42.0609753Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T09:33:42.0610806Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T09:33:42.0611621Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T09:33:42.0612621Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T09:33:42.0613634Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T09:33:42.0614420Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T09:33:42.0615409Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T09:33:42.0616172Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T09:33:42.0617378Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T09:33:42.0618342Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T09:33:42.0619621Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T09:33:42.0620393Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T09:33:42.0621260Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T09:33:42.0622221Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T09:33:42.0623297Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T09:33:42.0624665Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T09:33:42.0625479Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T09:33:42.0626428Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T09:33:42.0627258Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T09:33:42.0628508Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T09:33:42.0629318Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T09:33:42.0630312Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T09:33:42.0631139Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T09:33:42.0632252Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T09:33:42.0633032Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T09:33:42.0634014Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T09:33:42.0635010Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T09:33:42.0636018Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T09:33:42.0636830Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T09:33:42.0637688Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T09:33:42.0638680Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T09:33:42.0639451Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T09:33:42.0640495Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T09:33:42.0641228Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T09:33:42.0642151Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T09:33:42.0643221Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T09:33:42.0643974Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T09:33:42.0644959Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T09:33:42.0646068Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T09:33:42.0646862Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T09:33:42.0647860Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T09:33:42.0648880Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T09:33:42.0649679Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T09:33:42.0650540Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T09:33:42.0651940Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T09:33:42.0652293Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T09:33:42.0653302Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T09:33:42.0654134Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T09:33:42.0654893Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T09:33:42.0655887Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T09:33:42.0656665Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T09:33:42.0657800Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T09:33:42.0659116Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T09:33:42.0660619Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T09:33:42.0661359Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T09:33:42.0662150Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T09:33:42.0663191Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T09:33:42.0663891Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T09:33:42.0664995Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T09:33:42.0665678Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T09:33:42.0666506Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T09:33:42.0667580Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T09:33:42.0668254Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T09:33:42.0669247Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T09:33:42.0670059Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T09:33:42.0670735Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T09:33:42.0671566Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T09:33:42.0672382Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T09:33:42.0673182Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T09:33:42.0674415Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T09:33:42.0675697Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T09:33:42.0676768Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T09:33:42.0677545Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T09:33:42.0678572Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T09:33:42.0679286Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T09:33:42.0680393Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T09:33:42.0681161Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T09:33:42.0682122Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T09:33:42.0683325Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T09:33:42.0684023Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T09:33:42.0684881Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T09:33:42.0685986Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T09:33:42.0686947Z * [new tag] cslpull75 -> cslpull75 2025-12-04T09:33:42.0687773Z * [new tag] cslpull76 -> cslpull76 2025-12-04T09:33:42.0688797Z * [new tag] cslpull77 -> cslpull77 2025-12-04T09:33:42.0689861Z * [new tag] cslpull78 -> cslpull78 2025-12-04T09:33:42.0690947Z * [new tag] cslpull79 -> cslpull79 2025-12-04T09:33:42.0692302Z * [new tag] cslpull80 -> cslpull80 2025-12-04T09:33:42.0693331Z * [new tag] cslpull81 -> cslpull81 2025-12-04T09:33:42.0694322Z * [new tag] cslpull82 -> cslpull82 2025-12-04T09:33:42.0695292Z * [new tag] cslpull83 -> cslpull83 2025-12-04T09:33:42.0696266Z * [new tag] cslpull84 -> cslpull84 2025-12-04T09:33:42.0697074Z * [new tag] cslpull85 -> cslpull85 2025-12-04T09:33:42.0698270Z * [new tag] cslpull86 -> cslpull86 2025-12-04T09:33:42.0699260Z * [new tag] cslpull87 -> cslpull87 2025-12-04T09:33:42.0700269Z * [new tag] cslpull88 -> cslpull88 2025-12-04T09:33:42.0701143Z * [new tag] cslpull89 -> cslpull89 2025-12-04T09:33:42.0702198Z * [new tag] cslpull90 -> cslpull90 2025-12-04T09:33:42.0703588Z * [new tag] cslpull91 -> cslpull91 2025-12-04T09:33:42.0704495Z * [new tag] cslpull92 -> cslpull92 2025-12-04T09:33:42.0705636Z * [new tag] flight_5 -> flight_5 2025-12-04T09:33:42.0706799Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T09:33:42.0707785Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T09:33:42.0708865Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T09:33:42.0709912Z * [new tag] forpull1 -> forpull1 2025-12-04T09:33:42.0711148Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T09:33:42.0712116Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T09:33:42.0713137Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T09:33:42.0714178Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T09:33:42.0715235Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T09:33:42.0716378Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T09:33:42.0717752Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T09:33:42.0718805Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T09:33:42.0720234Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T09:33:42.0721651Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T09:33:42.0722690Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T09:33:42.0723969Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T09:33:42.0724948Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T09:33:42.0725930Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T09:33:42.0727265Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T09:33:42.0728334Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T09:33:42.0729290Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T09:33:42.0730348Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T09:33:42.0731449Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T09:33:42.0732459Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T09:33:42.0733776Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T09:33:42.0735330Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T09:33:42.0736311Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T09:33:42.0737350Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T09:33:42.0738396Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T09:33:42.0739391Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T09:33:42.0740462Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T09:33:42.0741684Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T09:33:42.0744381Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T09:33:42.0744872Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T09:33:42.0745335Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T09:33:42.0745821Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T09:33:42.0746673Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T09:33:42.0747683Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T09:33:42.0748931Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T09:33:42.0749749Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T09:33:42.0750787Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T09:33:42.0751834Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T09:33:42.0752922Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T09:33:42.0753941Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T09:33:42.0754811Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T09:33:42.0755804Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T09:33:42.0756911Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T09:33:42.0758202Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T09:33:42.0759172Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T09:33:42.0760410Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T09:33:42.0761334Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T09:33:42.0762508Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T09:33:42.0763602Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T09:33:42.0765542Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T09:33:42.0766109Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T09:33:42.0766759Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T09:33:42.0767781Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T09:33:42.0768597Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T09:33:42.0769663Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T09:33:42.0770684Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T09:33:42.0771690Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T09:33:42.0772766Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T09:33:42.0773794Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T09:33:42.0774844Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T09:33:42.0775772Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T09:33:42.0777173Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T09:33:42.0778008Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T09:33:42.0779107Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T09:33:42.0780231Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T09:33:42.0781259Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T09:33:42.0782135Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T09:33:42.0783291Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T09:33:42.0784421Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T09:33:42.0786053Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T09:33:42.0787032Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T09:33:42.0788063Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T09:33:42.0789091Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T09:33:42.0790281Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T09:33:42.0791361Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T09:33:42.0792387Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T09:33:42.0793574Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T09:33:42.0794571Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T09:33:42.0795396Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T09:33:42.0796468Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T09:33:42.0797565Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T09:33:42.0798497Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T09:33:42.0799597Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T09:33:42.0800658Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T09:33:42.0804703Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T09:33:42.0806168Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T09:33:42.0807054Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T09:33:42.0808164Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T09:33:42.0809301Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T09:33:42.0810339Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T09:33:42.0811397Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T09:33:42.0812397Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T09:33:42.0813508Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T09:33:42.0814528Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T09:33:42.0815557Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T09:33:42.0816774Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T09:33:42.0817729Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T09:33:42.0818710Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T09:33:42.0819712Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T09:33:42.0820829Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T09:33:42.0821938Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T09:33:42.0822945Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T09:33:42.0823984Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T09:33:42.0825096Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T09:33:42.0826216Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T09:33:42.0827137Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T09:33:42.0828130Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T09:33:42.0829216Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T09:33:42.0830426Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T09:33:42.0831279Z * [new tag] trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f 2025-12-04T09:33:42.0832361Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T09:33:42.0833472Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T09:33:42.0834559Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T09:33:42.0835399Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T09:33:42.0836422Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T09:33:42.0837450Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T09:33:42.0838463Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T09:33:42.0839645Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T09:33:42.0840488Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T09:33:42.0841487Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T09:33:42.0842527Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T09:33:42.0843713Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T09:33:42.0844664Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T09:33:42.0845610Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T09:33:42.0846686Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T09:33:42.0848106Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T09:33:42.0849578Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T09:33:42.0850557Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T09:33:42.0851468Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T09:33:42.0852854Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T09:33:42.0853827Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T09:33:42.0854835Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T09:33:42.0855953Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T09:33:42.0857048Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T09:33:42.0858137Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T09:33:42.0859032Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T09:33:42.0859927Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T09:33:42.0861029Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T09:33:42.0862079Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T09:33:42.0863186Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T09:33:42.0864325Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T09:33:42.0865398Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T09:33:42.0866455Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T09:33:42.0867472Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T09:33:42.0868530Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T09:33:42.0869643Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T09:33:42.0870832Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T09:33:42.0871862Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T09:33:42.0873124Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T09:33:42.0874610Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T09:33:42.0875556Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T09:33:42.0876557Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T09:33:42.0877441Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T09:33:42.0878599Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T09:33:42.0879670Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T09:33:42.0880810Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T09:33:42.0881708Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T09:33:42.0882775Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T09:33:42.0883917Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T09:33:42.0885039Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T09:33:42.0886101Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T09:33:42.0887200Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T09:33:42.0888285Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T09:33:42.0889385Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T09:33:42.0890489Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T09:33:42.0891523Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T09:33:42.0892748Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T09:33:42.0893858Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T09:33:42.0894999Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T09:33:42.0895947Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T09:33:42.0896991Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T09:33:42.0898077Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T09:33:42.0899451Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T09:33:42.0900395Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T09:33:42.0901893Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T09:33:42.0902944Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T09:33:42.0904050Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T09:33:42.0905502Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T09:33:42.0906469Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T09:33:42.0907601Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T09:33:42.0915906Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T09:33:42.0916552Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T09:33:42.0917168Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T09:33:42.0917925Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T09:33:42.0918408Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T09:33:42.0918985Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T09:33:42.0919474Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T09:33:42.0920021Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T09:33:42.0920535Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T09:33:42.0921119Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T09:33:42.0921598Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T09:33:42.0922230Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T09:33:42.0922717Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T09:33:42.0923261Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T09:33:42.0923816Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T09:33:42.0924994Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T09:33:42.0926014Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T09:33:42.0927072Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T09:33:42.0928224Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T09:33:42.0929343Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T09:33:42.0930389Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T09:33:42.0931449Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T09:33:42.0932493Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T09:33:42.0933616Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T09:33:42.0935008Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T09:33:42.0935823Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T09:33:42.0936966Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T09:33:42.0938096Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T09:33:42.0939264Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T09:33:42.0940353Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T09:33:42.0941517Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T09:33:42.0942499Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T09:33:42.0944077Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T09:33:42.0945587Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T09:33:42.0946727Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T09:33:42.0947754Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T09:33:42.0948803Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T09:33:42.0949912Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T09:33:42.0950958Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T09:33:42.0951963Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T09:33:42.0953022Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T09:33:42.0954149Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T09:33:42.0955080Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T09:33:42.0956229Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T09:33:42.0957339Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T09:33:42.0959028Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T09:33:42.0960084Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T09:33:42.0961300Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:42.0962040Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T09:33:42.0963233Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T09:33:42.0964142Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T09:33:42.0965224Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T09:33:42.0966152Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T09:33:42.0967192Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T09:33:42.0967997Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T09:33:42.0969037Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T09:33:42.0970115Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T09:33:42.0971068Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T09:33:42.0971931Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T09:33:42.0972963Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T09:33:42.0973974Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T09:33:42.0975070Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T09:33:42.0976138Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T09:33:42.0977035Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T09:33:42.0978060Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T09:33:42.0979116Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T09:33:42.0980001Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T09:33:42.0981072Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T09:33:42.0982142Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T09:33:42.0982924Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T09:33:42.0983855Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T09:33:42.0984959Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T09:33:42.0986186Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T09:33:42.0987327Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T09:33:42.0988269Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T09:33:42.0989053Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T09:33:42.0990144Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T09:33:42.0990959Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T09:33:42.0991695Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T09:33:42.0992519Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T09:33:42.0993602Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T09:33:42.0995286Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T09:33:42.0996432Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T09:33:42.0997541Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T09:33:42.0998647Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T09:33:42.0999752Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T09:33:42.1000423Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T09:33:42.1001427Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T09:33:42.1002978Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T09:33:42.1003923Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T09:33:42.1004994Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T09:33:42.1006059Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T09:33:42.1007123Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T09:33:42.1008164Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T09:33:42.1009336Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T09:33:42.1010099Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T09:33:42.1010908Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T09:33:42.1011676Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T09:33:42.1012888Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T09:33:42.1013957Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T09:33:42.1015072Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T09:33:42.1016102Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T09:33:42.1016873Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T09:33:42.1017989Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T09:33:42.1018949Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T09:33:42.1019990Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T09:33:42.1021081Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T09:33:42.1022202Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T09:33:42.1022981Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T09:33:42.1023748Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T09:33:42.1024887Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T09:33:42.1025641Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T09:33:42.1026701Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T09:33:42.1027756Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T09:33:42.1028669Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T09:33:42.1029765Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T09:33:42.1030549Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T09:33:42.1031592Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T09:33:42.1032509Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T09:33:42.1033307Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T09:33:42.1034544Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T09:33:42.1035681Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T09:33:42.1036772Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T09:33:42.1037883Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T09:33:42.1038813Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T09:33:42.1039606Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T09:33:42.1040711Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T09:33:42.1041516Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T09:33:42.1042383Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T09:33:42.1043585Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T09:33:42.1044899Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T09:33:42.1045957Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T09:33:42.1047029Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T09:33:42.1047978Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T09:33:42.1049046Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T09:33:42.1049787Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T09:33:42.1050937Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T09:33:42.1052046Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T09:33:42.1053149Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T09:33:42.1054217Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T09:33:42.1054984Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T09:33:42.1056095Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T09:33:42.1057230Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T09:33:42.1058371Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T09:33:42.1059146Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T09:33:42.1060680Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T09:33:42.1061540Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T09:33:42.1062636Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T09:33:42.1063732Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T09:33:42.1064653Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T09:33:42.1065442Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T09:33:42.1066267Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T09:33:42.1067487Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T09:33:42.1068284Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T09:33:42.1069091Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T09:33:42.1070705Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T09:33:42.1071547Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T09:33:42.1072630Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T09:33:42.1073718Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T09:33:42.1074847Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T09:33:42.1075950Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T09:33:42.1076717Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T09:33:42.1077824Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T09:33:42.1079085Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T09:33:42.1079876Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T09:33:42.1081022Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T09:33:42.1081974Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T09:33:42.1083213Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T09:33:42.1084366Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T09:33:42.1085285Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T09:33:42.1086404Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T09:33:42.1087246Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T09:33:42.1088373Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T09:33:42.1089472Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T09:33:42.1090220Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T09:33:42.1091252Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T09:33:42.1092004Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T09:33:42.1093669Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T09:33:42.1094746Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T09:33:42.1095852Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T09:33:42.1096999Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T09:33:42.1098075Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T09:33:42.1099149Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T09:33:42.1099871Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T09:33:42.1101328Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T09:33:42.1102630Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T09:33:42.1103680Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T09:33:42.1104848Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T09:33:42.1105950Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T09:33:42.1106885Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T09:33:42.1107683Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T09:33:42.1108757Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T09:33:42.1109962Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T09:33:42.1111067Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T09:33:42.1111860Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T09:33:42.1112972Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T09:33:42.1114035Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T09:33:42.1114985Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T09:33:42.1116071Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T09:33:42.1117027Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T09:33:42.1118092Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T09:33:42.1119041Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T09:33:42.1119844Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T09:33:42.1120648Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T09:33:42.1121839Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T09:33:42.1123179Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T09:33:42.1123955Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T09:33:42.1124745Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T09:33:42.1125592Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T09:33:42.1127279Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T09:33:42.1128102Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T09:33:42.1128929Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T09:33:42.1130217Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T09:33:42.1131172Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T09:33:42.1132275Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T09:33:42.1133426Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T09:33:42.1134368Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T09:33:42.1135437Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T09:33:42.1136577Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T09:33:42.1137601Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T09:33:42.1138686Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T09:33:42.1139437Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T09:33:42.1140586Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T09:33:42.1141646Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T09:33:42.1142388Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T09:33:42.1143184Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T09:33:42.1144359Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T09:33:42.1145434Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T09:33:42.1146549Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T09:33:42.1147608Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T09:33:42.1148701Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T09:33:42.1149634Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T09:33:42.1150720Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T09:33:42.1151823Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T09:33:42.1152933Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T09:33:42.1154032Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T09:33:42.1155113Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T09:33:42.1156201Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T09:33:42.1157286Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T09:33:42.1158074Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T09:33:42.1159268Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T09:33:42.1160408Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T09:33:42.1161531Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T09:33:42.1162695Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T09:33:42.1163827Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T09:33:42.1164590Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T09:33:42.1165678Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T09:33:42.1166794Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T09:33:42.1167915Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T09:33:42.1168981Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T09:33:42.1170115Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T09:33:42.1171142Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T09:33:42.1172241Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T09:33:42.1173412Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T09:33:42.1174119Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T09:33:42.1174887Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T09:33:42.1175737Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T09:33:42.1176919Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T09:33:42.1178093Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T09:33:42.1179242Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T09:33:42.1180188Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T09:33:42.1181526Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T09:33:42.1182737Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T09:33:42.1183864Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T09:33:42.1185087Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T09:33:42.1186181Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T09:33:42.1187441Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T09:33:42.1188484Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T09:33:42.1189363Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T09:33:42.1190558Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T09:33:42.1191746Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T09:33:42.1192862Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T09:33:42.1193903Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T09:33:42.1195432Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T09:33:42.1196566Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T09:33:42.1197713Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T09:33:42.1198904Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T09:33:42.1199669Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T09:33:42.1200966Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T09:33:42.1205448Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T09:33:42.1206770Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T09:33:42.1207927Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T09:33:42.1208997Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T09:33:42.1209843Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T09:33:42.1211030Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T09:33:42.1212112Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T09:33:42.1213428Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T09:33:42.1214608Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T09:33:42.1215779Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T09:33:42.1216926Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T09:33:42.1218034Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T09:33:42.1219117Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T09:33:42.1220264Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T09:33:42.1221382Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T09:33:42.1222651Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T09:33:42.1223617Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T09:33:42.1225049Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T09:33:42.1226178Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T09:33:42.1227328Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T09:33:42.1228447Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T09:33:42.1229770Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T09:33:42.1230944Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T09:33:42.1232203Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T09:33:42.1233051Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T09:33:42.1233883Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T09:33:42.1235043Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T09:33:42.1236252Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T09:33:42.1237739Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T09:33:42.1238808Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T09:33:42.1239719Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T09:33:42.1240893Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T09:33:42.1241815Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T09:33:42.1243018Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T09:33:42.1244006Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T09:33:42.1244956Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T09:33:42.1246312Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T09:33:42.1247154Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T09:33:42.1248497Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T09:33:42.1249568Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T09:33:42.1251096Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T09:33:42.1252225Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T09:33:42.1253213Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T09:33:42.1254301Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T09:33:42.1255370Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T09:33:42.1256479Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T09:33:42.1257518Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T09:33:42.1258522Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T09:33:42.1259586Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T09:33:42.1260631Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T09:33:42.1261627Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T09:33:42.1262719Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T09:33:42.1263826Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T09:33:42.1265005Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T09:33:42.1265848Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T09:33:42.1266994Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T09:33:42.1268020Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T09:33:42.1269119Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T09:33:42.1270101Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T09:33:42.1271223Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T09:33:42.1272274Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T09:33:42.1273350Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T09:33:42.1274384Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T09:33:42.1275585Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T09:33:42.1276709Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T09:33:42.1277672Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T09:33:42.1278786Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T09:33:42.1279773Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T09:33:42.1280823Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T09:33:42.1281855Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T09:33:42.1282984Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T09:33:42.1284013Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T09:33:42.1285042Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T09:33:42.1286355Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T09:33:42.1287407Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T09:33:42.1288454Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T09:33:42.1289528Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T09:33:42.1290559Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T09:33:42.1291684Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T09:33:42.1292698Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T09:33:42.1293942Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T09:33:42.1294950Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T09:33:42.1295941Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T09:33:42.1297039Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T09:33:42.1298120Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T09:33:42.1299156Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T09:33:42.1300234Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T09:33:42.1301509Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T09:33:42.1302371Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T09:33:42.1303610Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T09:33:42.1304463Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T09:33:42.1305653Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T09:33:42.1306771Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T09:33:42.1307813Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T09:33:42.1308838Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T09:33:42.1309999Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T09:33:42.1311062Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T09:33:42.1311894Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T09:33:42.1313058Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T09:33:42.1314199Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T09:33:42.1315272Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T09:33:42.1316393Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T09:33:42.1317476Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T09:33:42.1318325Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T09:33:42.1319890Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T09:33:42.1320961Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T09:33:42.1322018Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T09:33:42.1323176Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T09:33:42.1324271Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T09:33:42.1325305Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T09:33:42.1326354Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T09:33:42.1327417Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T09:33:42.1328487Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T09:33:42.1329604Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T09:33:42.1330619Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T09:33:42.1331639Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T09:33:42.1332865Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T09:33:42.1333705Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T09:33:42.1334849Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T09:33:42.1336233Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T09:33:42.1337284Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T09:33:42.1338377Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T09:33:42.1339397Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T09:33:42.1340233Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T09:33:42.1341357Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T09:33:42.1342473Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T09:33:42.1343464Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T09:33:42.1344569Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T09:33:42.1345668Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T09:33:42.1346894Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T09:33:42.1347989Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T09:33:42.1349028Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T09:33:42.1349860Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T09:33:42.1350975Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T09:33:42.1352158Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T09:33:42.1353264Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T09:33:42.1354282Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T09:33:42.1355229Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T09:33:42.1356418Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T09:33:42.1357489Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T09:33:42.1358499Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T09:33:42.1359586Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T09:33:42.1360649Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T09:33:42.1361680Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T09:33:42.1363010Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T09:33:42.1364039Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T09:33:42.1365119Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T09:33:42.1366184Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T09:33:42.1367244Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T09:33:42.1368334Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T09:33:42.1369368Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T09:33:42.1370483Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T09:33:42.1371617Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T09:33:42.1372984Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T09:33:42.1374081Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T09:33:42.1375167Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T09:33:42.1376223Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T09:33:42.1377303Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T09:33:42.1378396Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T09:33:42.1379405Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T09:33:42.1380219Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T09:33:42.1381433Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T09:33:42.1382235Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T09:33:42.1383431Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T09:33:42.1384462Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T09:33:42.1385669Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T09:33:42.1386739Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T09:33:42.1387823Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T09:33:42.1389329Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T09:33:42.1390494Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T09:33:42.1392024Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T09:33:42.1393068Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T09:33:42.1394097Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T09:33:42.1395207Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T09:33:42.1396353Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T09:33:42.1398004Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T09:33:42.1399119Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T09:33:42.1399970Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T09:33:42.1401184Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T09:33:42.1402488Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T09:33:42.1403524Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T09:33:42.1404344Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T09:33:42.1405461Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T09:33:42.1406521Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T09:33:42.1407527Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T09:33:42.1408589Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T09:33:42.1409614Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T09:33:42.1410752Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T09:33:42.1411762Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T09:33:42.1412824Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T09:33:42.1413846Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T09:33:42.1415015Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T09:33:42.1416094Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T09:33:42.1418730Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T09:33:42.1419791Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T09:33:42.1420879Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T09:33:42.1422035Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T09:33:42.1423212Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T09:33:42.1424087Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T09:33:42.1425298Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T09:33:42.1426336Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T09:33:42.1427406Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T09:33:42.1428508Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T09:33:42.1429538Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T09:33:42.1430576Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T09:33:42.1431677Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T09:33:42.1432759Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T09:33:42.1433834Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T09:33:42.1434874Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T09:33:42.1435779Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T09:33:42.1436975Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T09:33:42.1438006Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T09:33:42.1439078Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T09:33:42.1440147Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T09:33:42.1441185Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T09:33:42.1442336Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T09:33:42.1443433Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T09:33:42.1444608Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T09:33:42.1445601Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T09:33:42.1446649Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T09:33:42.1447694Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T09:33:42.1448683Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T09:33:42.1449933Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T09:33:42.1451145Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T09:33:42.1452241Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T09:33:42.1453297Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T09:33:42.1454431Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T09:33:42.1455530Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T09:33:42.1456516Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T09:33:42.1457573Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T09:33:42.1458652Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T09:33:42.1459723Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T09:33:42.1461308Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T09:33:42.1462435Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T09:33:42.1463422Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T09:33:42.1465006Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T09:33:42.1466098Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T09:33:42.1467150Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T09:33:42.1468226Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T09:33:42.1469306Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T09:33:42.1470286Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T09:33:42.1471352Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T09:33:42.1472424Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T09:33:42.1473518Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T09:33:42.1474560Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T09:33:42.1475689Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T09:33:42.1476843Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T09:33:42.1477856Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T09:33:42.1479101Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T09:33:42.1480233Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T09:33:42.1481298Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T09:33:42.1482440Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T09:33:42.1483587Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T09:33:42.1484644Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T09:33:42.1485774Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T09:33:42.1487155Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T09:33:42.1488250Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T09:33:42.1489324Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T09:33:42.1490380Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T09:33:42.1491471Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T09:33:42.1492517Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T09:33:42.1493543Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T09:33:42.1494684Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T09:33:42.1495791Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T09:33:42.1496938Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T09:33:42.1498128Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T09:33:42.1499215Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T09:33:42.1500294Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T09:33:42.1501580Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T09:33:42.1502580Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T09:33:42.1503680Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T09:33:42.1504876Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T09:33:42.1505937Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T09:33:42.1506976Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T09:33:42.1508087Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T09:33:42.1509177Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T09:33:42.1510278Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T09:33:42.1511328Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T09:33:42.1512393Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T09:33:42.1513657Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T09:33:42.1514718Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T09:33:42.1515866Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T09:33:42.1516940Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T09:33:42.1518145Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T09:33:42.1519264Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T09:33:42.1520471Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T09:33:42.1521954Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T09:33:42.1523281Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T09:33:42.1524523Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T09:33:42.1525785Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T09:33:42.1527301Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T09:33:42.1528456Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T09:33:42.1529534Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T09:33:42.1530618Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T09:33:42.1531738Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T09:33:42.1532626Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T09:33:42.1534311Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T09:33:42.1535731Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T09:33:42.1536871Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T09:33:42.1537932Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T09:33:42.1539023Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T09:33:42.1540208Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T09:33:42.1541240Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T09:33:42.1542286Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T09:33:42.1543197Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T09:33:42.1544310Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T09:33:42.1545418Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T09:33:42.1546484Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T09:33:42.1547558Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T09:33:42.1548724Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T09:33:42.1549753Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T09:33:42.1550823Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T09:33:42.1551936Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T09:33:42.1553131Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T09:33:42.1554239Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T09:33:42.1555329Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T09:33:42.1556472Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T09:33:42.1557315Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T09:33:42.1558303Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T09:33:42.1559509Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T09:33:42.1560579Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T09:33:42.1561611Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T09:33:42.1562978Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T09:33:42.1564066Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T09:33:42.1565148Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T09:33:42.1566216Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T09:33:42.1567313Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T09:33:42.1568479Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T09:33:42.1569667Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T09:33:42.1570828Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T09:33:42.1571955Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T09:33:42.1573201Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T09:33:42.1574311Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T09:33:42.1575464Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T09:33:42.1576556Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T09:33:42.1577745Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T09:33:42.1578860Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T09:33:42.1580060Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T09:33:42.1581063Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T09:33:42.1582141Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T09:33:42.1583417Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T09:33:42.1584403Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T09:33:42.1585620Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T09:33:42.1587273Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T09:33:42.1588422Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T09:33:42.1589514Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T09:33:42.1590587Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T09:33:42.1591776Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T09:33:42.1592897Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T09:33:42.1593996Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T09:33:42.1595124Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T09:33:42.1596423Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T09:33:42.1597562Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T09:33:42.1598613Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T09:33:42.1599735Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T09:33:42.1600944Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T09:33:42.1605848Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T09:33:42.1607010Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T09:33:42.1608380Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T09:33:42.1609570Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T09:33:42.1611164Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T09:33:42.1612279Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T09:33:42.1613464Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T09:33:42.1614625Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T09:33:42.1615870Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T09:33:42.1616864Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T09:33:42.1617986Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T09:33:42.1619143Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T09:33:42.1620291Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T09:33:42.1621379Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T09:33:42.1622537Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T09:33:42.1623708Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T09:33:42.1624797Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T09:33:42.1628236Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T09:33:42.1628492Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T09:33:42.1628789Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T09:33:42.1630214Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T09:33:42.1630455Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T09:33:42.1631541Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T09:33:42.1632811Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T09:33:42.1633968Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T09:33:42.1635114Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T09:33:42.1636205Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T09:33:42.1637462Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T09:33:42.1638632Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T09:33:42.1639878Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T09:33:42.1640875Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T09:33:42.1642062Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T09:33:42.1643343Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T09:33:42.1644533Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T09:33:42.1645658Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T09:33:42.1646732Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T09:33:42.1647929Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T09:33:42.1648994Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T09:33:42.1649913Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T09:33:42.1651447Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T09:33:42.1652626Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T09:33:42.1653748Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T09:33:42.1654865Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T09:33:42.1656005Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T09:33:42.1657124Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T09:33:42.1658236Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T09:33:42.1659356Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T09:33:42.1660482Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T09:33:42.1661617Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T09:33:42.1662683Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T09:33:42.1663827Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T09:33:42.1664980Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T09:33:42.1666331Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T09:33:42.1667578Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T09:33:42.1668496Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T09:33:42.1669716Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T09:33:42.1671010Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T09:33:42.1671854Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T09:33:42.1673097Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T09:33:42.1674410Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T09:33:42.1675653Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T09:33:42.1676881Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T09:33:42.1677848Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T09:33:42.1679101Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T09:33:42.1680315Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T09:33:42.1681504Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T09:33:42.1682572Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T09:33:42.1683821Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T09:33:42.1685317Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T09:33:42.1686250Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T09:33:42.1687207Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T09:33:42.1688445Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T09:33:42.1689626Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T09:33:42.1690528Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T09:33:42.1692063Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T09:33:42.1693255Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T09:33:42.1694417Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T09:33:42.1695610Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T09:33:42.1696775Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T09:33:42.1697965Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T09:33:42.1699154Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T09:33:42.1700264Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T09:33:42.1702050Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T09:33:42.1703285Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T09:33:42.1704528Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T09:33:42.1705639Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T09:33:42.1706771Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T09:33:42.1707924Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T09:33:42.1709190Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T09:33:42.1710321Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T09:33:42.1711515Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T09:33:42.1712763Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T09:33:42.1714109Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T09:33:42.1715050Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T09:33:42.1716204Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T09:33:42.1717358Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T09:33:42.1718644Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T09:33:42.1719840Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T09:33:42.1721046Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T09:33:42.1722677Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T09:33:42.1723938Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T09:33:42.1725136Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T09:33:42.1726239Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T09:33:42.1727420Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T09:33:42.1728313Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T09:33:42.1729477Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T09:33:42.1730603Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T09:33:42.1731529Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T09:33:42.1732760Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T09:33:42.1733908Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T09:33:42.1734829Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T09:33:42.1736009Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T09:33:42.1737098Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T09:33:42.1738257Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T09:33:42.1739436Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T09:33:42.1740596Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T09:33:42.1741708Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T09:33:42.1742860Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T09:33:42.1744008Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T09:33:42.1745258Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T09:33:42.1746395Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T09:33:42.1747527Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T09:33:42.1748686Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T09:33:42.1749928Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T09:33:42.1751090Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T09:33:42.1752223Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T09:33:42.1753344Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T09:33:42.1754477Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T09:33:42.1755713Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T09:33:42.1756935Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T09:33:42.1758052Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T09:33:42.1759282Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T09:33:42.1760809Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T09:33:42.1762036Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T09:33:42.1763325Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T09:33:42.1764644Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T09:33:42.1765932Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T09:33:42.1767119Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T09:33:42.1768272Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T09:33:42.1769424Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T09:33:42.1770665Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T09:33:42.1771832Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T09:33:42.1773086Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T09:33:42.1774301Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T09:33:42.1775462Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T09:33:42.1776374Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T09:33:42.1777614Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T09:33:42.1778801Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T09:33:42.1779922Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T09:33:42.1781073Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T09:33:42.1782338Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T09:33:42.1784112Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T09:33:42.1785253Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T09:33:42.1786412Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T09:33:42.1787309Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T09:33:42.1788659Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T09:33:42.1789813Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T09:33:42.1791008Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T09:33:42.1792174Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T09:33:42.1793142Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T09:33:42.1794301Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T09:33:42.1795479Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T09:33:42.1796441Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T09:33:42.1797644Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T09:33:42.1798918Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T09:33:42.1800109Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T09:33:42.1801373Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T09:33:42.1802695Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T09:33:42.1804010Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T09:33:42.1805225Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T09:33:42.1806433Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T09:33:42.1807577Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T09:33:42.1808503Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T09:33:42.1809719Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T09:33:42.1810891Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T09:33:42.1812021Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T09:33:42.1813354Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T09:33:42.1814498Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T09:33:42.1815654Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T09:33:42.1816838Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T09:33:42.1818036Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T09:33:42.1819245Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T09:33:42.1820408Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T09:33:42.1821653Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T09:33:42.1822896Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T09:33:42.1823708Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T09:33:42.1824986Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T09:33:42.1826308Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T09:33:42.1827499Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T09:33:42.1828622Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T09:33:42.1829732Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T09:33:42.1830883Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T09:33:42.1832247Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T09:33:42.1833332Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T09:33:42.1834473Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T09:33:42.1835624Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T09:33:42.1837229Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T09:33:42.1838437Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T09:33:42.1839655Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T09:33:42.1840769Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T09:33:42.1841903Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T09:33:42.1842758Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T09:33:42.1844084Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T09:33:42.1845259Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T09:33:42.1846358Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T09:33:42.1847518Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T09:33:42.1848638Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T09:33:42.1849831Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T09:33:42.1850968Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T09:33:42.1852073Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T09:33:42.1853940Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T09:33:42.1855036Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T09:33:42.1856409Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T09:33:42.1857276Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T09:33:42.1858588Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T09:33:42.1859790Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T09:33:42.1860807Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T09:33:42.1861825Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T09:33:42.1863037Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T09:33:42.1864272Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T09:33:42.1866237Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T09:33:42.1867380Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T09:33:42.1868563Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T09:33:42.1869650Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T09:33:42.1870819Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T09:33:42.1872005Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T09:33:42.1873367Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T09:33:42.1874569Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T09:33:42.1875763Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T09:33:42.1876790Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T09:33:42.1877925Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T09:33:42.1879076Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T09:33:42.1880314Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T09:33:42.1881509Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T09:33:42.1882520Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T09:33:42.1883865Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T09:33:42.1885113Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T09:33:42.1886171Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T09:33:42.1887372Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T09:33:42.1888584Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T09:33:42.1889736Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T09:33:42.1890983Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T09:33:42.1892106Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T09:33:42.1893156Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T09:33:42.1894234Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T09:33:42.1895495Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T09:33:42.1896375Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T09:33:42.1897437Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T09:33:42.1898362Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T09:33:42.1899582Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T09:33:42.1900800Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T09:33:42.1902233Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T09:33:42.1903403Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T09:33:42.1904578Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T09:33:42.1905822Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T09:33:42.1907070Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T09:33:42.1908248Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T09:33:42.1909408Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T09:33:42.1910615Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T09:33:42.1911630Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T09:33:42.1913286Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T09:33:42.1914486Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T09:33:42.1915668Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T09:33:42.1916823Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T09:33:42.1918032Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T09:33:42.1919171Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T09:33:42.1920314Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T09:33:42.1921517Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T09:33:42.1922719Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T09:33:42.1923923Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T09:33:42.1925058Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T09:33:42.1926404Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T09:33:42.1927643Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T09:33:42.1928678Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T09:33:42.1929805Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T09:33:42.1931133Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T09:33:42.1932281Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T09:33:42.1933647Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T09:33:42.1934764Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T09:33:42.1935868Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T09:33:42.1936956Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T09:33:42.1938164Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T09:33:42.1939196Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T09:33:42.1940478Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T09:33:42.1941597Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T09:33:42.1942882Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T09:33:42.1944110Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T09:33:42.1945215Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T09:33:42.1946341Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T09:33:42.1947477Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T09:33:42.1948610Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T09:33:42.1949605Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T09:33:42.1950669Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T09:33:42.1951857Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T09:33:42.1953003Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T09:33:42.1954169Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T09:33:42.1955280Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T09:33:42.1956562Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T09:33:42.1957632Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T09:33:42.1958834Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T09:33:42.1959954Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T09:33:42.1961034Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T09:33:42.1962018Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T09:33:42.1963249Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T09:33:42.1964625Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T09:33:42.1965739Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T09:33:42.1966803Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T09:33:42.1968228Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T09:33:42.1969334Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T09:33:42.1970336Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T09:33:42.1971512Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T09:33:42.1972763Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T09:33:42.1973904Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T09:33:42.1975005Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T09:33:42.1976432Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T09:33:42.1977624Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T09:33:42.1978748Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T09:33:42.1979847Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T09:33:42.1980931Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T09:33:42.1982076Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T09:33:42.1983353Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T09:33:42.1984484Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T09:33:42.1985510Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T09:33:42.1986652Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T09:33:42.1988271Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T09:33:42.1989145Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T09:33:42.1990423Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T09:33:42.1991565Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T09:33:42.1992690Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T09:33:42.1993795Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T09:33:42.1994910Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T09:33:42.1996123Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T09:33:42.1997317Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T09:33:42.1998338Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T09:33:42.1999422Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T09:33:42.2000989Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T09:33:42.2002304Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T09:33:42.2003308Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T09:33:42.2004150Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T09:33:42.2005170Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T09:33:42.2006179Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T09:33:42.2007023Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T09:33:42.2008016Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T09:33:42.2008872Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T09:33:42.2010033Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T09:33:42.2010773Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T09:33:42.2011767Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T09:33:42.2012846Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T09:33:42.2013710Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T09:33:42.2014741Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T09:33:42.2015577Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T09:33:42.2016592Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T09:33:42.2017462Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T09:33:42.2018447Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T09:33:42.2019326Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T09:33:42.2020341Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T09:33:42.2021218Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T09:33:42.2022218Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T09:33:42.2023070Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T09:33:42.2024124Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T09:33:42.2024982Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T09:33:42.2026257Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T09:33:42.2027432Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T09:33:42.2028840Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T09:33:42.2887158Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T09:33:42.2917990Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:42.2923785Z ##[endgroup] 2025-12-04T09:33:42.2924561Z ##[group]Determining the checkout info 2025-12-04T09:33:42.2925482Z ##[endgroup] 2025-12-04T09:33:42.2930726Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T09:33:42.2966112Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T09:33:42.2993569Z ##[group]Checking out the ref 2025-12-04T09:33:42.2998213Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:43.3461707Z Updating files: 75% (15216/20121) 2025-12-04T09:33:43.3619747Z Updating files: 76% (15292/20121) 2025-12-04T09:33:43.3763218Z Updating files: 77% (15494/20121) 2025-12-04T09:33:43.3990839Z Updating files: 78% (15695/20121) 2025-12-04T09:33:43.4283926Z Updating files: 79% (15896/20121) 2025-12-04T09:33:43.4639915Z Updating files: 80% (16097/20121) 2025-12-04T09:33:43.4959656Z Updating files: 81% (16299/20121) 2025-12-04T09:33:43.5195473Z Updating files: 82% (16500/20121) 2025-12-04T09:33:43.5362498Z Updating files: 83% (16701/20121) 2025-12-04T09:33:43.5515450Z Updating files: 84% (16902/20121) 2025-12-04T09:33:43.5693288Z Updating files: 85% (17103/20121) 2025-12-04T09:33:43.5862142Z Updating files: 86% (17305/20121) 2025-12-04T09:33:43.6013958Z Updating files: 87% (17506/20121) 2025-12-04T09:33:43.6138214Z Updating files: 88% (17707/20121) 2025-12-04T09:33:43.6289684Z Updating files: 89% (17908/20121) 2025-12-04T09:33:43.6479461Z Updating files: 90% (18109/20121) 2025-12-04T09:33:43.6605615Z Updating files: 91% (18311/20121) 2025-12-04T09:33:43.6777168Z Updating files: 92% (18512/20121) 2025-12-04T09:33:43.6980619Z Updating files: 93% (18713/20121) 2025-12-04T09:33:43.7208256Z Updating files: 94% (18914/20121) 2025-12-04T09:33:43.7402288Z Updating files: 95% (19115/20121) 2025-12-04T09:33:43.7575425Z Updating files: 96% (19317/20121) 2025-12-04T09:33:43.7758337Z Updating files: 97% (19518/20121) 2025-12-04T09:33:43.8073447Z Updating files: 98% (19719/20121) 2025-12-04T09:33:43.8267213Z Updating files: 99% (19920/20121) 2025-12-04T09:33:43.8267600Z Updating files: 100% (20121/20121) 2025-12-04T09:33:43.8267954Z Updating files: 100% (20121/20121), done. 2025-12-04T09:33:43.8578213Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T09:33:43.8578610Z 2025-12-04T09:33:43.8578861Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T09:33:43.8579511Z changes and commit them, and you can discard any commits you make in this 2025-12-04T09:33:43.8580163Z state without impacting any branches by switching back to a branch. 2025-12-04T09:33:43.8580564Z 2025-12-04T09:33:43.8580820Z If you want to create a new branch to retain commits you create, you may 2025-12-04T09:33:43.8581402Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T09:33:43.8581757Z 2025-12-04T09:33:43.8581883Z git switch -c 2025-12-04T09:33:43.8582343Z 2025-12-04T09:33:43.8582477Z Or undo this operation with: 2025-12-04T09:33:43.8582686Z 2025-12-04T09:33:43.8582786Z git switch - 2025-12-04T09:33:43.8582950Z 2025-12-04T09:33:43.8583226Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T09:33:43.8583651Z 2025-12-04T09:33:43.8583996Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T09:33:43.8671150Z ##[endgroup] 2025-12-04T09:33:43.8671675Z ##[group]Setting up auth for fetching submodules 2025-12-04T09:33:43.8677787Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T09:33:43.8731836Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T09:33:43.8761176Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T09:33:43.8790744Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T09:33:43.8815889Z ##[endgroup] 2025-12-04T09:33:43.8816408Z ##[group]Fetching submodules 2025-12-04T09:33:43.8820464Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T09:33:43.9161241Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T09:33:43.9502009Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T09:33:43.9504508Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T09:33:43.9507880Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T09:33:43.9512387Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T09:33:43.9516721Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T09:33:43.9522077Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T09:33:43.9526394Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T09:33:43.9531272Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T09:33:43.9536301Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T09:33:43.9542191Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T09:33:43.9547214Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T09:33:43.9552575Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T09:33:43.9558228Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T09:33:43.9564125Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T09:33:43.9569691Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T09:33:43.9575745Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T09:33:43.9583369Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T09:33:43.9589298Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T09:33:43.9595723Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:33:43.9601854Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T09:33:43.9608623Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T09:33:43.9614807Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T09:33:43.9621350Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T09:33:43.9627970Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T09:33:43.9634881Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T09:33:43.9641771Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T09:33:43.9648842Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T09:33:43.9655818Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T09:33:43.9661738Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T09:33:43.9667389Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T09:33:43.9673472Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T09:33:43.9679583Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T09:33:43.9686140Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T09:33:43.9694336Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T09:33:43.9700395Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T09:33:43.9707080Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T09:33:43.9714003Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T09:33:43.9748511Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T09:33:44.2010666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T09:33:44.2011667Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T09:33:44.2048310Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T09:33:47.9644149Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T09:33:47.9646410Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T09:33:47.9648376Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T09:33:47.9650169Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T09:33:47.9652096Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T09:33:47.9654173Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T09:33:47.9656165Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T09:33:47.9658321Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T09:33:47.9660481Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T09:33:47.9662265Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T09:33:47.9664887Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T09:33:47.9666734Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T09:33:47.9750940Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T09:33:47.9752666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T09:33:47.9754317Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T09:33:47.9756023Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T09:33:47.9757736Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T09:33:47.9759441Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T09:33:48.1462410Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T09:33:48.1567564Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T09:34:09.3567168Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T09:34:09.3574991Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T09:34:09.3579746Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T09:34:09.3581411Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T09:34:09.3583006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T09:34:09.3584415Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T09:34:09.3585934Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T09:34:09.3587458Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T09:34:09.3589142Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T09:34:09.3591343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T09:34:09.4568250Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T09:34:13.4608759Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T09:34:13.4609690Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T09:34:13.4788605Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T09:34:13.4932352Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T09:34:13.5044561Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T09:34:13.5338383Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T09:34:13.6315382Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T09:34:13.6964997Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T09:34:14.5549519Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T09:34:14.7741573Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T09:34:14.7763993Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:34:14.7793541Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T09:34:20.1125669Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T09:34:20.1407362Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T09:34:20.5544747Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:34:20.6134925Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T09:34:20.7259755Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T09:34:20.7822585Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T09:34:21.5312660Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T09:34:21.7129658Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T09:34:21.7154161Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T09:34:21.7157235Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:34:21.7160118Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:34:21.7163353Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T09:34:21.7166704Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T09:34:21.7170164Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:34:21.7173503Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T09:34:21.7206265Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T09:34:23.1000084Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T09:34:23.1001363Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T09:34:23.1002570Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T09:34:23.2001434Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T09:34:26.8271496Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T09:34:26.9272239Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T09:34:30.1851965Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T09:34:30.5977215Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T09:34:30.7145983Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T09:34:31.4522088Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T09:34:31.5061566Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:34:31.5201155Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T09:34:31.6381088Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T09:34:31.7203548Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T09:34:31.7225615Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:34:31.7228169Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:34:31.7260021Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T09:34:36.5073685Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T09:34:36.7944589Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T09:34:37.4508170Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T09:34:37.6133104Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T09:34:37.6477676Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T09:34:37.6943071Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T09:34:37.7244068Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T09:34:37.7773092Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:34:37.7928707Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T09:34:37.7947733Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T09:34:37.7976337Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T09:34:56.1120282Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T09:34:56.1361042Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T09:34:56.2358721Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T09:34:56.2380323Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:34:56.2383032Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:34:56.2386197Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:34:56.2417488Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T09:34:57.0156938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T09:34:57.7030344Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T09:34:57.8088319Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T09:34:57.8106671Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:34:57.8109619Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:34:57.8112802Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:34:57.8116256Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:34:57.8119587Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:34:57.8123211Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:34:57.8126893Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:34:57.8130589Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:34:57.8134657Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:34:57.8167436Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T09:34:59.8366822Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T09:34:59.8368302Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T09:34:59.8369897Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T09:34:59.8371264Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T09:34:59.8372595Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T09:34:59.8373983Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T09:34:59.8375569Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T09:34:59.9367380Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T09:35:06.3839549Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T09:35:06.4050853Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T09:35:06.4477534Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T09:35:06.4639349Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T09:35:06.4657720Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:06.4687263Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T09:35:06.7619235Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T09:35:06.7841327Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T09:35:06.8377004Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:35:06.9518417Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T09:35:06.9716110Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T09:35:06.9919236Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T09:35:06.9938491Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:06.9941624Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:06.9971821Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:35:09.3699722Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:35:09.6609397Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T09:35:09.7154364Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:35:09.7531569Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T09:35:09.8068815Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T09:35:09.8697341Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T09:35:09.9156214Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T09:35:10.0462575Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T09:35:10.5233739Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T09:35:10.5278188Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:10.5311011Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T09:35:11.4379121Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T09:35:11.5208469Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T09:35:11.5232158Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:11.5235198Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:11.5238113Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:11.5241350Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:11.5245009Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:11.5248296Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:11.5251773Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:11.5255229Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:11.5287294Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T09:35:11.9726414Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T09:35:11.9728645Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T09:35:11.9730675Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T09:35:11.9732665Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T09:35:12.0727322Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T09:35:12.7888308Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T09:35:20.5998163Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T09:35:21.3395603Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T09:35:21.3870404Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T09:35:21.4069609Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T09:35:21.5283154Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T09:35:21.5447223Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T09:35:21.5622771Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T09:35:21.5811204Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T09:35:21.5829539Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:21.5832514Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:21.5863005Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T09:35:23.9628125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T09:35:24.2534934Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T09:35:24.3074263Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T09:35:24.8671885Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T09:35:24.8818554Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T09:35:25.1946633Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T09:35:25.1971963Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:25.1974837Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:25.2006181Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T09:35:25.7606332Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T09:35:26.2072398Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T09:35:26.2917370Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T09:35:26.3030794Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T09:35:26.3176732Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T09:35:26.3673581Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T09:35:26.4011347Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T09:35:26.4520166Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T09:35:26.4854091Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T09:35:26.4875926Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:26.4878726Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:26.4882076Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:26.4885258Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:26.4917649Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T09:35:27.7780474Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T09:35:27.7781636Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T09:35:27.7922466Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T09:35:27.8594469Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T09:35:27.8785707Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T09:35:27.9654440Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T09:35:27.9997326Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T09:35:28.0016643Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:28.0046923Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T09:35:28.2178451Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T09:35:28.2220274Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T09:35:28.2561729Z Entering 'android/libs/fbjni' 2025-12-04T09:35:28.2610408Z Entering 'third_party/FP16' 2025-12-04T09:35:28.2657539Z Entering 'third_party/FXdiv' 2025-12-04T09:35:28.2705706Z Entering 'third_party/NNPACK' 2025-12-04T09:35:28.2756507Z Entering 'third_party/NVTX' 2025-12-04T09:35:28.2806939Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:28.2854837Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:28.2921237Z Entering 'third_party/aiter' 2025-12-04T09:35:28.2969635Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:28.3028406Z Entering 'third_party/benchmark' 2025-12-04T09:35:28.3076037Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:28.3134666Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:28.3182624Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:28.3231149Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:28.3281602Z Entering 'third_party/cutlass' 2025-12-04T09:35:28.3340956Z Entering 'third_party/fbgemm' 2025-12-04T09:35:28.3391538Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:28.3439037Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:28.3499619Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:28.3551451Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:28.3609975Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:28.3656612Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:28.3703608Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:28.3753734Z Entering 'third_party/flash-attention' 2025-12-04T09:35:28.3804563Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:28.3857322Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:28.3915579Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:28.3966777Z Entering 'third_party/fmt' 2025-12-04T09:35:28.4017311Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:28.4065071Z Entering 'third_party/gloo' 2025-12-04T09:35:28.4114055Z Entering 'third_party/googletest' 2025-12-04T09:35:28.4161730Z Entering 'third_party/ideep' 2025-12-04T09:35:28.4209008Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:28.4266438Z Entering 'third_party/ittapi' 2025-12-04T09:35:28.4316049Z Entering 'third_party/kineto' 2025-12-04T09:35:28.4363847Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:28.4411041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:28.4458568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:28.4506377Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:28.4553707Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:28.4600726Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:28.4649162Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:28.4697480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:28.4745853Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:28.4795863Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:28.4843265Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:28.4890588Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:28.4941962Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:28.4993895Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:28.5041873Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:28.5090644Z Entering 'third_party/kleidiai' 2025-12-04T09:35:28.5140682Z Entering 'third_party/mimalloc' 2025-12-04T09:35:28.5187272Z Entering 'third_party/nlohmann' 2025-12-04T09:35:28.5237072Z Entering 'third_party/onnx' 2025-12-04T09:35:28.5306785Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:28.5358384Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:28.5409819Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:28.5457242Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:28.5504831Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:28.5553276Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:28.5601771Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:28.5647537Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:28.5693320Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:28.5740678Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:28.5789139Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:28.5839005Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:28.5909249Z Entering 'third_party/pocketfft' 2025-12-04T09:35:28.5958275Z Entering 'third_party/protobuf' 2025-12-04T09:35:28.6012403Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:28.6058591Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:28.6108868Z Entering 'third_party/psimd' 2025-12-04T09:35:28.6156535Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:28.6204617Z Entering 'third_party/pybind11' 2025-12-04T09:35:28.6253039Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:28.6302530Z Entering 'third_party/sleef' 2025-12-04T09:35:28.6350690Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:28.6398365Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:28.6444259Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:28.6489665Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:28.6537403Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:28.6584181Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:28.6647201Z ##[endgroup] 2025-12-04T09:35:28.6647764Z ##[group]Persisting credentials for submodules 2025-12-04T09:35:28.6654066Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T09:35:28.6994218Z Entering 'android/libs/fbjni' 2025-12-04T09:35:28.7060508Z Entering 'third_party/FP16' 2025-12-04T09:35:28.7125706Z Entering 'third_party/FXdiv' 2025-12-04T09:35:28.7189026Z Entering 'third_party/NNPACK' 2025-12-04T09:35:28.7251982Z Entering 'third_party/NVTX' 2025-12-04T09:35:28.7319065Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:28.7382588Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:28.7464397Z Entering 'third_party/aiter' 2025-12-04T09:35:28.7529150Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:28.7602768Z Entering 'third_party/benchmark' 2025-12-04T09:35:28.7666066Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:28.7738033Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:28.7801983Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:28.7865167Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:28.7928981Z Entering 'third_party/cutlass' 2025-12-04T09:35:28.8003688Z Entering 'third_party/fbgemm' 2025-12-04T09:35:28.8070321Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:28.8134484Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:28.8210112Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:28.8273819Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:28.8347099Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:28.8410459Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:28.8472778Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:28.8537755Z Entering 'third_party/flash-attention' 2025-12-04T09:35:28.8605258Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:28.8674871Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:28.8751146Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:28.8818734Z Entering 'third_party/fmt' 2025-12-04T09:35:28.8881608Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:28.8943967Z Entering 'third_party/gloo' 2025-12-04T09:35:28.9011908Z Entering 'third_party/googletest' 2025-12-04T09:35:28.9075687Z Entering 'third_party/ideep' 2025-12-04T09:35:28.9138205Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:28.9208965Z Entering 'third_party/ittapi' 2025-12-04T09:35:28.9272800Z Entering 'third_party/kineto' 2025-12-04T09:35:28.9340169Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:28.9402688Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:28.9465507Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:28.9530807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:28.9592680Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:28.9654572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:28.9718906Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:28.9787182Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:28.9854604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:28.9920523Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:28.9987369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:29.0049029Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.0114988Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.0186304Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:29.0252704Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:29.0320687Z Entering 'third_party/kleidiai' 2025-12-04T09:35:29.0386237Z Entering 'third_party/mimalloc' 2025-12-04T09:35:29.0451577Z Entering 'third_party/nlohmann' 2025-12-04T09:35:29.0519786Z Entering 'third_party/onnx' 2025-12-04T09:35:29.0606277Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:29.0671535Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:29.0738020Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:29.0800459Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:29.0863081Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:29.0925575Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:29.0989916Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:29.1054504Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:29.1116890Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:29.1176688Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.1241317Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.1308555Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:29.1394501Z Entering 'third_party/pocketfft' 2025-12-04T09:35:29.1458373Z Entering 'third_party/protobuf' 2025-12-04T09:35:29.1527010Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:29.1589623Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:29.1656382Z Entering 'third_party/psimd' 2025-12-04T09:35:29.1720656Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:29.1785721Z Entering 'third_party/pybind11' 2025-12-04T09:35:29.1849662Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:29.1913163Z Entering 'third_party/sleef' 2025-12-04T09:35:29.1976451Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:29.2040210Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:29.2102440Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:29.2164568Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:29.2227963Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:29.2292862Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:29.2377157Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T09:35:29.2721476Z Entering 'android/libs/fbjni' 2025-12-04T09:35:29.2779251Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T09:35:29.2798757Z Entering 'third_party/FP16' 2025-12-04T09:35:29.2857832Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T09:35:29.2876575Z Entering 'third_party/FXdiv' 2025-12-04T09:35:29.2935467Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T09:35:29.2953744Z Entering 'third_party/NNPACK' 2025-12-04T09:35:29.3013781Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T09:35:29.3032330Z Entering 'third_party/NVTX' 2025-12-04T09:35:29.3092175Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T09:35:29.3112139Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:29.3171011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T09:35:29.3189784Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:29.3248962Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T09:35:29.3283644Z Entering 'third_party/aiter' 2025-12-04T09:35:29.3341864Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T09:35:29.3361185Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:29.3422647Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T09:35:29.3452793Z Entering 'third_party/benchmark' 2025-12-04T09:35:29.3512487Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:29.3530840Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:29.3589019Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T09:35:29.3617748Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:29.3676927Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T09:35:29.3695581Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:29.3754466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T09:35:29.3773751Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:29.3832843Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T09:35:29.3851292Z Entering 'third_party/cutlass' 2025-12-04T09:35:29.3910295Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T09:35:29.3940986Z Entering 'third_party/fbgemm' 2025-12-04T09:35:29.4001113Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T09:35:29.4021657Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:29.4080662Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T09:35:29.4098200Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:29.4157498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T09:35:29.4184664Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:29.4247995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T09:35:29.4266487Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:29.4326552Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T09:35:29.4354311Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:29.4413495Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T09:35:29.4431228Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:29.4495754Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T09:35:29.4513228Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:29.4572813Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T09:35:29.4593739Z Entering 'third_party/flash-attention' 2025-12-04T09:35:29.4653174Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T09:35:29.4671157Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:29.4730091Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T09:35:29.4754793Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:29.4814540Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T09:35:29.4842132Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:29.4902310Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T09:35:29.4922560Z Entering 'third_party/fmt' 2025-12-04T09:35:29.4981625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:35:29.5000738Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:29.5059345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T09:35:29.5077986Z Entering 'third_party/gloo' 2025-12-04T09:35:29.5136262Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T09:35:29.5155031Z Entering 'third_party/googletest' 2025-12-04T09:35:29.5213398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.5231788Z Entering 'third_party/ideep' 2025-12-04T09:35:29.5291461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T09:35:29.5309242Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:29.5366514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T09:35:29.5394230Z Entering 'third_party/ittapi' 2025-12-04T09:35:29.5454040Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T09:35:29.5472507Z Entering 'third_party/kineto' 2025-12-04T09:35:29.5533838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T09:35:29.5552429Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:29.5613801Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T09:35:29.5631282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:29.5691545Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T09:35:29.5711287Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:29.5771350Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T09:35:29.5791114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:29.5851423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T09:35:29.5869034Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:29.5928633Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T09:35:29.5945545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:29.6005018Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T09:35:29.6024626Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:29.6084244Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T09:35:29.6103127Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:29.6161580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.6179368Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:29.6238383Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T09:35:29.6257416Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:29.6317434Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T09:35:29.6335594Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:29.6393172Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:35:29.6410116Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.6469714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:35:29.6489942Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.6552345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:35:29.6575689Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:29.6633714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T09:35:29.6651059Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:29.6709098Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.6728795Z Entering 'third_party/kleidiai' 2025-12-04T09:35:29.6787798Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T09:35:29.6808980Z Entering 'third_party/mimalloc' 2025-12-04T09:35:29.6867300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T09:35:29.6886076Z Entering 'third_party/nlohmann' 2025-12-04T09:35:29.6945917Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T09:35:29.6965856Z Entering 'third_party/onnx' 2025-12-04T09:35:29.7025724Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T09:35:29.7065073Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:29.7124221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:29.7145747Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:29.7208698Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T09:35:29.7228646Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:29.7287480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:29.7307171Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:29.7366930Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.7385735Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:29.7446232Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T09:35:29.7463897Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:29.7522827Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T09:35:29.7542194Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:29.7601712Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T09:35:29.7619651Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:29.7679011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T09:35:29.7696743Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:29.7755221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T09:35:29.7772062Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:29.7831522Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T09:35:29.7851467Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:29.7909450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T09:35:29.7929464Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:29.7992202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T09:35:29.8033017Z Entering 'third_party/pocketfft' 2025-12-04T09:35:29.8091694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T09:35:29.8111974Z Entering 'third_party/protobuf' 2025-12-04T09:35:29.8170283Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T09:35:29.8192543Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:29.8258302Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T09:35:29.8276616Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:29.8336335Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.8357366Z Entering 'third_party/psimd' 2025-12-04T09:35:29.8416581Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T09:35:29.8435665Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:29.8495881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T09:35:29.8515391Z Entering 'third_party/pybind11' 2025-12-04T09:35:29.8573782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:29.8592842Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:29.8654416Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T09:35:29.8673711Z Entering 'third_party/sleef' 2025-12-04T09:35:29.8735242Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T09:35:29.8754118Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:29.8815644Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T09:35:29.8833865Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:29.8892286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T09:35:29.8912181Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:29.8970830Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T09:35:29.8988005Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:29.9046504Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T09:35:29.9064515Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:29.9124214Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T09:35:29.9140527Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:29.9199990Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T09:35:30.0198560Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T09:35:30.0543636Z Entering 'android/libs/fbjni' 2025-12-04T09:35:30.0592669Z Entering 'third_party/FP16' 2025-12-04T09:35:30.0641199Z Entering 'third_party/FXdiv' 2025-12-04T09:35:30.0689328Z Entering 'third_party/NNPACK' 2025-12-04T09:35:30.0737123Z Entering 'third_party/NVTX' 2025-12-04T09:35:30.0785302Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:30.0832580Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:30.0896326Z Entering 'third_party/aiter' 2025-12-04T09:35:30.0945852Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:30.1002622Z Entering 'third_party/benchmark' 2025-12-04T09:35:30.1050045Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:30.1108779Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:30.1157257Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:30.1207089Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:30.1256280Z Entering 'third_party/cutlass' 2025-12-04T09:35:30.1315410Z Entering 'third_party/fbgemm' 2025-12-04T09:35:30.1367215Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:30.1416641Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:30.1473698Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:30.1524483Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:30.1579776Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:30.1625876Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:30.1671800Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:30.1722147Z Entering 'third_party/flash-attention' 2025-12-04T09:35:30.1769927Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:30.1824237Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:30.1884037Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:30.1936314Z Entering 'third_party/fmt' 2025-12-04T09:35:30.1984155Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:30.2032255Z Entering 'third_party/gloo' 2025-12-04T09:35:30.2081039Z Entering 'third_party/googletest' 2025-12-04T09:35:30.2129802Z Entering 'third_party/ideep' 2025-12-04T09:35:30.2176208Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:30.2232639Z Entering 'third_party/ittapi' 2025-12-04T09:35:30.2279580Z Entering 'third_party/kineto' 2025-12-04T09:35:30.2327079Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:30.2374058Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:30.2423418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:30.2471014Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:30.2519632Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:30.2565649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:30.2614439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:30.2664819Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:30.2712221Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:30.2759986Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:30.2807306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:30.2852807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:30.2908324Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:30.2961401Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:30.3009119Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:30.3057804Z Entering 'third_party/kleidiai' 2025-12-04T09:35:30.3109748Z Entering 'third_party/mimalloc' 2025-12-04T09:35:30.3156917Z Entering 'third_party/nlohmann' 2025-12-04T09:35:30.3209844Z Entering 'third_party/onnx' 2025-12-04T09:35:30.3277697Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:30.3329732Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:30.3379281Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:30.3425112Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:30.3471716Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:30.3519626Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:30.3566647Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:30.3613903Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:30.3660553Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:30.3705806Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:30.3759287Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:30.3806982Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:30.3876200Z Entering 'third_party/pocketfft' 2025-12-04T09:35:30.3925496Z Entering 'third_party/protobuf' 2025-12-04T09:35:30.3977682Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:30.4025415Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:30.4074445Z Entering 'third_party/psimd' 2025-12-04T09:35:30.4122092Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:30.4169664Z Entering 'third_party/pybind11' 2025-12-04T09:35:30.4217088Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:30.4264343Z Entering 'third_party/sleef' 2025-12-04T09:35:30.4313096Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:30.4360079Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:30.4407192Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:30.4452625Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:30.4499337Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:30.4545400Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:30.4620605Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T09:35:30.4958060Z Entering 'android/libs/fbjni' 2025-12-04T09:35:30.5007170Z Entering 'third_party/FP16' 2025-12-04T09:35:30.5055172Z Entering 'third_party/FXdiv' 2025-12-04T09:35:30.5105120Z Entering 'third_party/NNPACK' 2025-12-04T09:35:30.5152345Z Entering 'third_party/NVTX' 2025-12-04T09:35:30.5200731Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:30.5248615Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:30.5314451Z Entering 'third_party/aiter' 2025-12-04T09:35:30.5363242Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:30.5421195Z Entering 'third_party/benchmark' 2025-12-04T09:35:30.5469226Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:30.5526340Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:30.5574138Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:30.5622593Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:30.5670851Z Entering 'third_party/cutlass' 2025-12-04T09:35:30.5728865Z Entering 'third_party/fbgemm' 2025-12-04T09:35:30.5778912Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:30.5825674Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:30.5880970Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:30.5928885Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:30.5984802Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:30.6031345Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:30.6076895Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:30.6127225Z Entering 'third_party/flash-attention' 2025-12-04T09:35:30.6176732Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:30.6232360Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:30.6291960Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:30.6345564Z Entering 'third_party/fmt' 2025-12-04T09:35:30.6394401Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:30.6445134Z Entering 'third_party/gloo' 2025-12-04T09:35:30.6496681Z Entering 'third_party/googletest' 2025-12-04T09:35:30.6546494Z Entering 'third_party/ideep' 2025-12-04T09:35:30.6594122Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:30.6649601Z Entering 'third_party/ittapi' 2025-12-04T09:35:30.6697187Z Entering 'third_party/kineto' 2025-12-04T09:35:30.6746221Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:30.6795027Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:30.6845038Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:30.6892497Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:30.6938876Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:30.6983922Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:30.7035915Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:30.7083399Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:30.7130408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:30.7179146Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:30.7226592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:30.7274821Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:30.7326752Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:30.7378349Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:30.7425187Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:30.7473869Z Entering 'third_party/kleidiai' 2025-12-04T09:35:30.7525226Z Entering 'third_party/mimalloc' 2025-12-04T09:35:30.7573555Z Entering 'third_party/nlohmann' 2025-12-04T09:35:30.7623080Z Entering 'third_party/onnx' 2025-12-04T09:35:30.7691373Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:30.7744735Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:30.7798102Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:30.7844883Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:30.7893271Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:30.7939629Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:30.7987740Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:30.8036234Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:30.8081829Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:30.8126859Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:30.8174638Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:30.8221826Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:30.8293413Z Entering 'third_party/pocketfft' 2025-12-04T09:35:30.8342559Z Entering 'third_party/protobuf' 2025-12-04T09:35:30.8394164Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:30.8441099Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:30.8489341Z Entering 'third_party/psimd' 2025-12-04T09:35:30.8539068Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:30.8586424Z Entering 'third_party/pybind11' 2025-12-04T09:35:30.8635065Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:30.8682645Z Entering 'third_party/sleef' 2025-12-04T09:35:30.8733949Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:30.8782766Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:30.8830051Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:30.8876606Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:30.8923582Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:30.8970389Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:30.9033941Z ##[endgroup] 2025-12-04T09:35:30.9073712Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T09:35:30.9099655Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:35:30.9205674Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T09:35:30.9206095Z cd "${GITHUB_WORKSPACE}" 2025-12-04T09:35:30.9206599Z # Clean stale submodule dirs 2025-12-04T09:35:30.9206977Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T09:35:30.9207435Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T09:35:30.9207877Z else 2025-12-04T09:35:30.9208228Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T09:35:30.9208676Z fi 2025-12-04T09:35:30.9216814Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:30.9217264Z env: 2025-12-04T09:35:30.9217526Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:30.9217817Z NO_SUDO: true 2025-12-04T09:35:30.9218080Z ##[endgroup] 2025-12-04T09:35:30.9589531Z Entering 'android/libs/fbjni' 2025-12-04T09:35:30.9629788Z Entering 'third_party/FP16' 2025-12-04T09:35:30.9666319Z Entering 'third_party/FXdiv' 2025-12-04T09:35:30.9702461Z Entering 'third_party/NNPACK' 2025-12-04T09:35:30.9741780Z Entering 'third_party/NVTX' 2025-12-04T09:35:30.9785633Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T09:35:30.9824203Z Entering 'third_party/XNNPACK' 2025-12-04T09:35:30.9968577Z Entering 'third_party/aiter' 2025-12-04T09:35:31.0020673Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T09:35:31.0147692Z Entering 'third_party/benchmark' 2025-12-04T09:35:31.0187744Z Entering 'third_party/composable_kernel' 2025-12-04T09:35:31.0321569Z Entering 'third_party/cpp-httplib' 2025-12-04T09:35:31.0359797Z Entering 'third_party/cpuinfo' 2025-12-04T09:35:31.0400002Z Entering 'third_party/cudnn_frontend' 2025-12-04T09:35:31.0439272Z Entering 'third_party/cutlass' 2025-12-04T09:35:31.0554224Z Entering 'third_party/fbgemm' 2025-12-04T09:35:31.0622262Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T09:35:31.0656973Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T09:35:31.0789175Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T09:35:31.0828094Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T09:35:31.0938685Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T09:35:31.0976439Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T09:35:31.1009747Z Entering 'third_party/fbgemm/external/json' 2025-12-04T09:35:31.1059824Z Entering 'third_party/flash-attention' 2025-12-04T09:35:31.1105866Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T09:35:31.1218245Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T09:35:31.1320506Z Entering 'third_party/flatbuffers' 2025-12-04T09:35:31.1400661Z Entering 'third_party/fmt' 2025-12-04T09:35:31.1438227Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T09:35:31.1475206Z Entering 'third_party/gloo' 2025-12-04T09:35:31.1512955Z Entering 'third_party/googletest' 2025-12-04T09:35:31.1550835Z Entering 'third_party/ideep' 2025-12-04T09:35:31.1584596Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T09:35:31.1681987Z Entering 'third_party/ittapi' 2025-12-04T09:35:31.1721005Z Entering 'third_party/kineto' 2025-12-04T09:35:31.1760840Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T09:35:31.1801899Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T09:35:31.1851996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T09:35:31.1887237Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T09:35:31.1922857Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T09:35:31.1955825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T09:35:31.1991434Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T09:35:31.2029495Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T09:35:31.2066935Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T09:35:31.2118739Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T09:35:31.2153959Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T09:35:31.2190138Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:31.2246041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:31.2290066Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T09:35:31.2325600Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T09:35:31.2366191Z Entering 'third_party/kleidiai' 2025-12-04T09:35:31.2410864Z Entering 'third_party/mimalloc' 2025-12-04T09:35:31.2448931Z Entering 'third_party/nlohmann' 2025-12-04T09:35:31.2500243Z Entering 'third_party/onnx' 2025-12-04T09:35:31.2875672Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T09:35:31.2917792Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T09:35:31.2981113Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T09:35:31.3018016Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T09:35:31.3054687Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T09:35:31.3088702Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T09:35:31.3135972Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T09:35:31.3170364Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T09:35:31.3205362Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T09:35:31.3241246Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T09:35:31.3293958Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T09:35:31.3334358Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T09:35:31.3627175Z Entering 'third_party/pocketfft' 2025-12-04T09:35:31.3662967Z Entering 'third_party/protobuf' 2025-12-04T09:35:31.3752840Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T09:35:31.3787279Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T09:35:31.3828736Z Entering 'third_party/psimd' 2025-12-04T09:35:31.3863673Z Entering 'third_party/pthreadpool' 2025-12-04T09:35:31.3901295Z Entering 'third_party/pybind11' 2025-12-04T09:35:31.3939557Z Entering 'third_party/python-peachpy' 2025-12-04T09:35:31.3976421Z Entering 'third_party/sleef' 2025-12-04T09:35:31.4016097Z Entering 'third_party/tensorpipe' 2025-12-04T09:35:31.4054271Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T09:35:31.4090762Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T09:35:31.4125102Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T09:35:31.4163519Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T09:35:31.4197172Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T09:35:31.4372753Z Prepare all required actions 2025-12-04T09:35:31.4373399Z Getting action download info 2025-12-04T09:35:31.6502455Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T09:35:31.6502823Z env: 2025-12-04T09:35:31.6503077Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:31.6503372Z ##[endgroup] 2025-12-04T09:35:31.6552920Z ##[group]Run set -euo pipefail 2025-12-04T09:35:31.6553571Z set -euo pipefail 2025-12-04T09:35:31.6553999Z function get_ec2_metadata() { 2025-12-04T09:35:31.6554568Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T09:35:31.6555449Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T09:35:31.6556216Z  category=$1 2025-12-04T09:35:31.6556746Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T09:35:31.6557465Z  runner_name_str=i-03bbda7791efb68ed 2025-12-04T09:35:31.6558075Z  if [[ -f /.inarc ]]; then 2025-12-04T09:35:31.6558559Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T09:35:31.6559202Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T09:35:31.6559859Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T09:35:31.6560401Z  else 2025-12-04T09:35:31.6561598Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T09:35:31.6562938Z  fi 2025-12-04T09:35:31.6563322Z } 2025-12-04T09:35:31.6563715Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T09:35:31.6564323Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T09:35:31.6565022Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T09:35:31.6565623Z echo "system info $(uname -a)" 2025-12-04T09:35:31.6573753Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:31.6574328Z env: 2025-12-04T09:35:31.6574688Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:31.6575091Z ##[endgroup] 2025-12-04T09:35:31.6738293Z ami-id: ami-08982f1c5bf93d976 2025-12-04T09:35:31.6856093Z instance-id: i-03bbda7791efb68ed 2025-12-04T09:35:31.6970651Z instance-type: g4dn.4xlarge 2025-12-04T09:35:31.6982670Z system info Linux ip-10-0-76-64.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T09:35:31.7006707Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:35:31.7007279Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T09:35:31.7014759Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:31.7015210Z env: 2025-12-04T09:35:31.7015450Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:31.7015760Z ##[endgroup] 2025-12-04T09:35:33.0650257Z Thu Dec 4 09:35:33 2025 2025-12-04T09:35:33.0651519Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:33.0652175Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:35:33.0652814Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:35:33.0653455Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:35:33.0654132Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:35:33.0654685Z | | | MIG M. | 2025-12-04T09:35:33.0655087Z |=========================================+========================+======================| 2025-12-04T09:35:33.0750629Z | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:35:33.0751528Z | N/A 32C P0 28W / 70W | 0MiB / 15360MiB | 8% Default | 2025-12-04T09:35:33.0752013Z | | | N/A | 2025-12-04T09:35:33.0752501Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:35:33.0752905Z 2025-12-04T09:35:33.0753129Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:33.0753674Z | Processes: | 2025-12-04T09:35:33.0754234Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:35:33.0754743Z | ID ID Usage | 2025-12-04T09:35:33.0755175Z |=========================================================================================| 2025-12-04T09:35:33.0755724Z | No running processes found | 2025-12-04T09:35:33.0756323Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:35:33.4867783Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:35:33.4868919Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:35:33.4878400Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:33.4878852Z env: 2025-12-04T09:35:33.4879100Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:33.4879411Z ##[endgroup] 2025-12-04T09:35:33.4940391Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T09:35:33.4940929Z if systemctl is-active --quiet docker; then 2025-12-04T09:35:33.4941670Z  echo "Docker daemon is running..."; 2025-12-04T09:35:33.4942090Z else 2025-12-04T09:35:33.4942521Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T09:35:33.4943031Z fi 2025-12-04T09:35:33.4950025Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:33.4950477Z env: 2025-12-04T09:35:33.4950733Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:33.4951026Z ##[endgroup] 2025-12-04T09:35:33.5042874Z Docker daemon is running... 2025-12-04T09:35:33.5089381Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:35:33.5089724Z with: 2025-12-04T09:35:33.5089957Z shell: bash 2025-12-04T09:35:33.5090214Z timeout_minutes: 5 2025-12-04T09:35:33.5090497Z max_attempts: 3 2025-12-04T09:35:33.5090759Z retry_wait_seconds: 30 2025-12-04T09:35:33.5093518Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T09:35:33.5096311Z polling_interval_seconds: 1 2025-12-04T09:35:33.5096650Z warning_on_retry: true 2025-12-04T09:35:33.5096962Z continue_on_error: false 2025-12-04T09:35:33.5097244Z env: 2025-12-04T09:35:33.5097483Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:33.5097793Z AWS_RETRY_MODE: standard 2025-12-04T09:35:33.5098083Z AWS_MAX_ATTEMPTS: 5 2025-12-04T09:35:33.5098372Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:35:33.5098690Z ##[endgroup] 2025-12-04T09:35:34.8338349Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:34.8339378Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:34.8340054Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:34.8340529Z 2025-12-04T09:35:34.8340650Z Login Succeeded 2025-12-04T09:35:35.6047537Z Command completed after 1 attempt(s). 2025-12-04T09:35:35.6103178Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:35.6103816Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:35.6104374Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:35:35.6113877Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.6114318Z env: 2025-12-04T09:35:35.6114595Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.6114890Z ##[endgroup] 2025-12-04T09:35:35.6201833Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:35:35.6202631Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T09:35:35.6203166Z # shellcheck disable=SC2046 2025-12-04T09:35:35.6203568Z docker stop $(docker ps -q) || true 2025-12-04T09:35:35.6203985Z # Prune all of the docker images 2025-12-04T09:35:35.6204363Z docker system prune -af 2025-12-04T09:35:35.6211367Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.6211821Z env: 2025-12-04T09:35:35.6212093Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.6212390Z ##[endgroup] 2025-12-04T09:35:35.6496000Z "docker stop" requires at least 1 argument. 2025-12-04T09:35:35.6496484Z See 'docker stop --help'. 2025-12-04T09:35:35.6496703Z 2025-12-04T09:35:35.6496893Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T09:35:35.6497211Z 2025-12-04T09:35:35.6513285Z Stop one or more running containers 2025-12-04T09:35:35.6715774Z Total reclaimed space: 0B 2025-12-04T09:35:35.6926493Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T09:35:35.6927070Z with: 2025-12-04T09:35:35.6928017Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.6929081Z use-custom-docker-registry: true 2025-12-04T09:35:35.6929449Z docker-build-dir: .ci/docker 2025-12-04T09:35:35.6929797Z docker-build-script: ./build.sh 2025-12-04T09:35:35.6930307Z working-directory: . 2025-12-04T09:35:35.6930702Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.6931173Z force-push: false 2025-12-04T09:35:35.6931436Z env: 2025-12-04T09:35:35.6931664Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.6931965Z ##[endgroup] 2025-12-04T09:35:35.6953083Z ##[group]Run set -ex 2025-12-04T09:35:35.6953419Z set -ex 2025-12-04T09:35:35.6953687Z  2025-12-04T09:35:35.6954282Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T09:35:35.6955077Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T09:35:35.6955757Z # job could then download the pre-built image as usual 2025-12-04T09:35:35.6956573Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T09:35:35.6957324Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6957724Z else 2025-12-04T09:35:35.6958033Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6958551Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6959017Z  2025-12-04T09:35:35.6959676Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T09:35:35.6960433Z  exit 0 2025-12-04T09:35:35.6960677Z fi 2025-12-04T09:35:35.6960918Z  2025-12-04T09:35:35.6961308Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T09:35:35.6962021Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T09:35:35.6962719Z  # use it as it is, but first let's extract the tag 2025-12-04T09:35:35.6963298Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T09:35:35.6963909Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6964489Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6964957Z else 2025-12-04T09:35:35.6965266Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T09:35:35.6965718Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T09:35:35.6966171Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T09:35:35.6966567Z  fi 2025-12-04T09:35:35.6967097Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T09:35:35.6967821Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6968566Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6969396Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.6969907Z fi 2025-12-04T09:35:35.6976902Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.6977342Z env: 2025-12-04T09:35:35.6977589Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.6977882Z REPO_NAME: pytorch 2025-12-04T09:35:35.6978972Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.6980025Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:35:35.6980364Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T09:35:35.6980797Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.6981277Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T09:35:35.6981620Z CUSTOM_TAG_PREFIX: 2025-12-04T09:35:35.6981888Z ##[endgroup] 2025-12-04T09:35:35.7010273Z + [[ -d .ci/docker ]] 2025-12-04T09:35:35.7010597Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T09:35:35.7011098Z + [[ true == \t\r\u\e ]] 2025-12-04T09:35:35.7011384Z + echo skip=false 2025-12-04T09:35:35.7012670Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T09:35:35.7019017Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7020366Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T09:35:35.7044755Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7046122Z + echo docker-tag=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7047633Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7073484Z ##[group]Run set +e 2025-12-04T09:35:35.7073842Z set +e 2025-12-04T09:35:35.7074184Z set -x 2025-12-04T09:35:35.7074462Z  2025-12-04T09:35:35.7074709Z login() { 2025-12-04T09:35:35.7075260Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:35:35.7075879Z } 2025-12-04T09:35:35.7076126Z  2025-12-04T09:35:35.7076351Z retry () { 2025-12-04T09:35:35.7076665Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:35:35.7077029Z } 2025-12-04T09:35:35.7077247Z  2025-12-04T09:35:35.7077512Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:35:35.7077861Z  2025-12-04T09:35:35.7078105Z START_TIME=$(date +%s) 2025-12-04T09:35:35.7078430Z # Wait up to 120 minutes 2025-12-04T09:35:35.7078850Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T09:35:35.7079441Z  # Check if image already exists, if it does then skip building it 2025-12-04T09:35:35.7080010Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T09:35:35.7080438Z  exit 0 2025-12-04T09:35:35.7080704Z  fi 2025-12-04T09:35:35.7080935Z  2025-12-04T09:35:35.7081388Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T09:35:35.7082172Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T09:35:35.7083092Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T09:35:35.7083693Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T09:35:35.7084166Z  # It's a Docker build job, let's build the image 2025-12-04T09:35:35.7084573Z  break 2025-12-04T09:35:35.7084842Z  else 2025-12-04T09:35:35.7085258Z  # It's a regular build job, wait for the image to become available 2025-12-04T09:35:35.7085734Z  sleep 300 2025-12-04T09:35:35.7086015Z  fi 2025-12-04T09:35:35.7086266Z done 2025-12-04T09:35:35.7086497Z  2025-12-04T09:35:35.7086911Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T09:35:35.7087769Z # be empty. The default action would be to continue rebuild the image 2025-12-04T09:35:35.7088387Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T09:35:35.7088911Z  # if we're on the base branch then use the parent commit 2025-12-04T09:35:35.7089394Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T09:35:35.7089765Z else 2025-12-04T09:35:35.7090132Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T09:35:35.7090700Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T09:35:35.7091213Z fi 2025-12-04T09:35:35.7091439Z  2025-12-04T09:35:35.7091718Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T09:35:35.7092139Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.7092524Z  2025-12-04T09:35:35.7093060Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T09:35:35.7093717Z  exit 0 2025-12-04T09:35:35.7093974Z fi 2025-12-04T09:35:35.7094196Z  2025-12-04T09:35:35.7094549Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T09:35:35.7095359Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T09:35:35.7096054Z  exit 1 2025-12-04T09:35:35.7096293Z fi 2025-12-04T09:35:35.7096528Z  2025-12-04T09:35:35.7096947Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T09:35:35.7097727Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T09:35:35.7098425Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T09:35:35.7099235Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T09:35:35.7100152Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T09:35:35.7100676Z fi 2025-12-04T09:35:35.7101158Z  2025-12-04T09:35:35.7101462Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T09:35:35.7108173Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:35.7108614Z env: 2025-12-04T09:35:35.7108861Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:35.7109179Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T09:35:35.7109570Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:35:35.7110675Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7112010Z DOCKER_TAG: pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:35.7112807Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.7113250Z DOCKER_PUSH: 2025-12-04T09:35:35.7113510Z ##[endgroup] 2025-12-04T09:35:35.7140821Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.7141600Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:35.7145516Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:35:35.7146932Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:36.3142029Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:36.3142792Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:36.3143466Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:36.3143923Z 2025-12-04T09:35:36.3144040Z Login Succeeded 2025-12-04T09:35:36.3158600Z ++ date +%s 2025-12-04T09:35:36.3169423Z + START_TIME=1764840936 2025-12-04T09:35:36.3172977Z ++ date +%s 2025-12-04T09:35:36.3185325Z + [[ 1764833736 -lt 1764840936 ]] 2025-12-04T09:35:36.3186681Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:36.5440470Z { 2025-12-04T09:35:36.5440866Z "schemaVersion": 2, 2025-12-04T09:35:36.5441396Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T09:35:36.5441908Z "config": { 2025-12-04T09:35:36.5442362Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T09:35:36.5442842Z "size": 34787, 2025-12-04T09:35:36.5443593Z "digest": "sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024" 2025-12-04T09:35:36.5444142Z }, 2025-12-04T09:35:36.5444370Z "layers": [ 2025-12-04T09:35:36.5444599Z { 2025-12-04T09:35:36.5444972Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5445460Z "size": 30447951, 2025-12-04T09:35:36.5445967Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T09:35:36.5446507Z }, 2025-12-04T09:35:36.5446723Z { 2025-12-04T09:35:36.5447103Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5447576Z "size": 1554, 2025-12-04T09:35:36.5448043Z "digest": "sha256:835841cca3b7e1464290cdb78e48773e03583413fbed852c3cc5165a392ea44d" 2025-12-04T09:35:36.5448593Z }, 2025-12-04T09:35:36.5448794Z { 2025-12-04T09:35:36.5449273Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5449763Z "size": 313276213, 2025-12-04T09:35:36.5450273Z "digest": "sha256:1bf1bb125deaa5b8a3adf121671e87ba2fa7e229f9eb1dff7ade581cb737175a" 2025-12-04T09:35:36.5450825Z }, 2025-12-04T09:35:36.5451041Z { 2025-12-04T09:35:36.5451435Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5451906Z "size": 787, 2025-12-04T09:35:36.5452382Z "digest": "sha256:b21856d1bf420da6fa8ec7331b82ab355d4f4178644e7d3a3d3d0fbc3610109a" 2025-12-04T09:35:36.5452940Z }, 2025-12-04T09:35:36.5453142Z { 2025-12-04T09:35:36.5453515Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5453997Z "size": 106, 2025-12-04T09:35:36.5454468Z "digest": "sha256:848ba2c095e2b9e6acfb0ecf077adb526fb2fa82ed44cf6648ebde97f296f8ec" 2025-12-04T09:35:36.5455027Z }, 2025-12-04T09:35:36.5455243Z { 2025-12-04T09:35:36.5455601Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5456082Z "size": 704, 2025-12-04T09:35:36.5456558Z "digest": "sha256:029495b23122c840ca0e52d487afa8d2c4dbf1991cd7f204ec3e434dcf947bf4" 2025-12-04T09:35:36.5457110Z }, 2025-12-04T09:35:36.5457319Z { 2025-12-04T09:35:36.5457693Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5458175Z "size": 1216, 2025-12-04T09:35:36.5458638Z "digest": "sha256:073bb82063cfba4639b11fea43753dbb128f9238353189fc02d2e2aa0b2ad359" 2025-12-04T09:35:36.5459188Z }, 2025-12-04T09:35:36.5459406Z { 2025-12-04T09:35:36.5459765Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5460248Z "size": 484, 2025-12-04T09:35:36.5460713Z "digest": "sha256:59b63930883363c7d2aaab27cc61555d9f3e119dc18247a8624c98ebdaa354a5" 2025-12-04T09:35:36.5461286Z }, 2025-12-04T09:35:36.5461501Z { 2025-12-04T09:35:36.5461873Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5462344Z "size": 110362071, 2025-12-04T09:35:36.5462827Z "digest": "sha256:1c6177b2970db2d7743b4337c420a35f2ec79f338c30d97d534a1f0987c00913" 2025-12-04T09:35:36.5463373Z }, 2025-12-04T09:35:36.5463589Z { 2025-12-04T09:35:36.5463945Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5464427Z "size": 4961, 2025-12-04T09:35:36.5464913Z "digest": "sha256:fabe466dd5f33c3209a56abf5cb46b9b07fe21c57fb43b98e13308c8665c0864" 2025-12-04T09:35:36.5465456Z }, 2025-12-04T09:35:36.5465675Z { 2025-12-04T09:35:36.5466226Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5466699Z "size": 1755, 2025-12-04T09:35:36.5467173Z "digest": "sha256:2b5a11b41761d8ea3b829e4772e4064cb6c4e4989126af324d0057661e4493a1" 2025-12-04T09:35:36.5467719Z }, 2025-12-04T09:35:36.5467923Z { 2025-12-04T09:35:36.5468304Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5468784Z "size": 724, 2025-12-04T09:35:36.5469243Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:36.5469841Z }, 2025-12-04T09:35:36.5470055Z { 2025-12-04T09:35:36.5470423Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5470891Z "size": 544, 2025-12-04T09:35:36.5471357Z "digest": "sha256:dc0780902fca810498f16efa71f8e5990385f141a0cfcc552616a4acc434f79a" 2025-12-04T09:35:36.5471905Z }, 2025-12-04T09:35:36.5472105Z { 2025-12-04T09:35:36.5472483Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5472969Z "size": 3185191720, 2025-12-04T09:35:36.5473454Z "digest": "sha256:5b09a2b135c8e540e2b9374b68991afdd63a5dfaba75fb44efe054a591f400c2" 2025-12-04T09:35:36.5474006Z }, 2025-12-04T09:35:36.5474220Z { 2025-12-04T09:35:36.5474580Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5475062Z "size": 32, 2025-12-04T09:35:36.5475538Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5476091Z }, 2025-12-04T09:35:36.5476295Z { 2025-12-04T09:35:36.5476671Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5477150Z "size": 396, 2025-12-04T09:35:36.5477623Z "digest": "sha256:5bfdaeb5578d6ffcd7db29c48303cbceb13c591210feaa216a8daa7a6d445b4b" 2025-12-04T09:35:36.5478184Z }, 2025-12-04T09:35:36.5478396Z { 2025-12-04T09:35:36.5478754Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5479242Z "size": 236865, 2025-12-04T09:35:36.5479712Z "digest": "sha256:0ef42867f370b8a14b8c301388793b78a0bd2533bb2a317b129b03c8667dc767" 2025-12-04T09:35:36.5480271Z }, 2025-12-04T09:35:36.5480475Z { 2025-12-04T09:35:36.5480851Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5481338Z "size": 230, 2025-12-04T09:35:36.5481792Z "digest": "sha256:446083e497f322789c2d87933a77fb2dfd94e18d2e85f6d4362e6e9521b82c4e" 2025-12-04T09:35:36.5482461Z }, 2025-12-04T09:35:36.5482684Z { 2025-12-04T09:35:36.5483050Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5483541Z "size": 3043500, 2025-12-04T09:35:36.5484030Z "digest": "sha256:d8a170bef0f4e0e28f5ba0952320dd465552adf74f0864b4f47cc11f4c4f82f7" 2025-12-04T09:35:36.5484589Z }, 2025-12-04T09:35:36.5484793Z { 2025-12-04T09:35:36.5485170Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5485655Z "size": 1472, 2025-12-04T09:35:36.5486132Z "digest": "sha256:e2b6cd6a5bd0418a1e4aca3f37942324d4d9f9b0177597e37fc8d1a5626048e1" 2025-12-04T09:35:36.5486689Z }, 2025-12-04T09:35:36.5486909Z { 2025-12-04T09:35:36.5487268Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5487745Z "size": 481, 2025-12-04T09:35:36.5488213Z "digest": "sha256:93efc0181a22218a544413f1d57e9e0e7a0f492e41bef598084c5b9177e3987a" 2025-12-04T09:35:36.5488746Z }, 2025-12-04T09:35:36.5488961Z { 2025-12-04T09:35:36.5489339Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5489817Z "size": 202, 2025-12-04T09:35:36.5490286Z "digest": "sha256:7454c938f17425bcf167ad28a62b42b95f638a7d2cf0840885cfe5ffe8480a12" 2025-12-04T09:35:36.5490829Z }, 2025-12-04T09:35:36.5491039Z { 2025-12-04T09:35:36.5491396Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5491874Z "size": 607, 2025-12-04T09:35:36.5492435Z "digest": "sha256:4d57ff55f6d4161cb6c29e2c0b08d47e65898427db3938479158684899f0023d" 2025-12-04T09:35:36.5492968Z }, 2025-12-04T09:35:36.5493184Z { 2025-12-04T09:35:36.5493554Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5494028Z "size": 6243016141, 2025-12-04T09:35:36.5494523Z "digest": "sha256:b0301534b4a58072d5b140b08a7608bbead41d126fa29fdc78c1e8a43ebb865d" 2025-12-04T09:35:36.5495070Z }, 2025-12-04T09:35:36.5495272Z { 2025-12-04T09:35:36.5495645Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5496228Z "size": 829, 2025-12-04T09:35:36.5496699Z "digest": "sha256:1969e15d0c13874ea5883ed829235a19ef6dc21c8aa6172032b78a8ffa6ff262" 2025-12-04T09:35:36.5497232Z }, 2025-12-04T09:35:36.5497445Z { 2025-12-04T09:35:36.5497814Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5498285Z "size": 33450177, 2025-12-04T09:35:36.5498784Z "digest": "sha256:73180a0f2d5a961a0cc0ba2c3cf375fdcfb43ae5e4e5c63a000c4b4366d52a64" 2025-12-04T09:35:36.5499338Z }, 2025-12-04T09:35:36.5499573Z { 2025-12-04T09:35:36.5499950Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5500434Z "size": 104, 2025-12-04T09:35:36.5501145Z "digest": "sha256:ad81b25cb69f8cf42a4a96678a64b7d0598a8f95236a3e63d1fec4e53edff613" 2025-12-04T09:35:36.5501717Z }, 2025-12-04T09:35:36.5501937Z { 2025-12-04T09:35:36.5502301Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5502785Z "size": 1496, 2025-12-04T09:35:36.5503263Z "digest": "sha256:8165374f8dccf88a7791a5d31afbe29e4d4542b4f1cf1904945e07f9af6bf8ba" 2025-12-04T09:35:36.5503817Z }, 2025-12-04T09:35:36.5504018Z { 2025-12-04T09:35:36.5504389Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5504869Z "size": 458786969, 2025-12-04T09:35:36.5505353Z "digest": "sha256:7779c0bb9be2030df9060b526b98d0afeed1ce5b61ee0530321ef04a4e145e8c" 2025-12-04T09:35:36.5505909Z }, 2025-12-04T09:35:36.5506123Z { 2025-12-04T09:35:36.5506479Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5506960Z "size": 164, 2025-12-04T09:35:36.5507427Z "digest": "sha256:4d0a1c027262ed8c83181b931b64afa1c41c3cac97580231c4cae3a524ebd7d5" 2025-12-04T09:35:36.5507960Z }, 2025-12-04T09:35:36.5508174Z { 2025-12-04T09:35:36.5508551Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5509021Z "size": 346, 2025-12-04T09:35:36.5509486Z "digest": "sha256:a51e0dab2d596e6563483f27c12660007160847d177ba4c31812a8f44ada5754" 2025-12-04T09:35:36.5510022Z }, 2025-12-04T09:35:36.5510236Z { 2025-12-04T09:35:36.5510595Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5511071Z "size": 32, 2025-12-04T09:35:36.5511542Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5512080Z }, 2025-12-04T09:35:36.5512298Z { 2025-12-04T09:35:36.5512670Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5513135Z "size": 106, 2025-12-04T09:35:36.5513617Z "digest": "sha256:3eb6d4ff040b8761b1e3e1da768bdb884ce0e5324e3d0f6471b0a8b2ddf4736f" 2025-12-04T09:35:36.5514173Z }, 2025-12-04T09:35:36.5514374Z { 2025-12-04T09:35:36.5514746Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5515227Z "size": 424, 2025-12-04T09:35:36.5515700Z "digest": "sha256:b168858b85373f8ddca549d79267a06de4fa945d04bf791c55c9ddc93957fa3c" 2025-12-04T09:35:36.5516245Z }, 2025-12-04T09:35:36.5516463Z { 2025-12-04T09:35:36.5516839Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5517310Z "size": 19309367, 2025-12-04T09:35:36.5517795Z "digest": "sha256:d77a39278026a8899e2f97643918bdcf96e711ca26951880b4841b319dc71321" 2025-12-04T09:35:36.5518336Z }, 2025-12-04T09:35:36.5518539Z { 2025-12-04T09:35:36.5519082Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5519574Z "size": 108, 2025-12-04T09:35:36.5520052Z "digest": "sha256:36fbd357280b6b40e90f36ac3d19da3da10e5dbf0027a5cfe8e2f29d1870d347" 2025-12-04T09:35:36.5520618Z }, 2025-12-04T09:35:36.5520837Z { 2025-12-04T09:35:36.5521203Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5521689Z "size": 826, 2025-12-04T09:35:36.5522165Z "digest": "sha256:4e3b10a5dd6aed29f238d604925e2a4f873141c1087c8dd4fdde5c61e7560893" 2025-12-04T09:35:36.5522928Z }, 2025-12-04T09:35:36.5523134Z { 2025-12-04T09:35:36.5523511Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5524001Z "size": 724, 2025-12-04T09:35:36.5524460Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:36.5525011Z }, 2025-12-04T09:35:36.5525230Z { 2025-12-04T09:35:36.5525600Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5526094Z "size": 149, 2025-12-04T09:35:36.5526570Z "digest": "sha256:3092fab73b59190b9facfc49bf18f58612172bc2fd68dfa339a1118632616939" 2025-12-04T09:35:36.5527110Z }, 2025-12-04T09:35:36.5527330Z { 2025-12-04T09:35:36.5527711Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5528176Z "size": 136, 2025-12-04T09:35:36.5528660Z "digest": "sha256:20020dd28a15ba092fcbfe906ee39cdddfcc9d0b7eb42fdd6f4c08a984fa9c00" 2025-12-04T09:35:36.5529223Z }, 2025-12-04T09:35:36.5529440Z { 2025-12-04T09:35:36.5529800Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5530281Z "size": 140, 2025-12-04T09:35:36.5530755Z "digest": "sha256:ae5280ce969dcff08c091e9a5f7641f13561b2b0ee44d78b7c3f81d8fe8e6d32" 2025-12-04T09:35:36.5531298Z }, 2025-12-04T09:35:36.5531512Z { 2025-12-04T09:35:36.5531882Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5532356Z "size": 32, 2025-12-04T09:35:36.5532832Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5533388Z }, 2025-12-04T09:35:36.5533595Z { 2025-12-04T09:35:36.5533963Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5534445Z "size": 223, 2025-12-04T09:35:36.5534915Z "digest": "sha256:026e4484b749dfc556dcf7c8f45c1759518a89072e4dbc974d9405ada1582d03" 2025-12-04T09:35:36.5535454Z }, 2025-12-04T09:35:36.5535672Z { 2025-12-04T09:35:36.5536053Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5536524Z "size": 256, 2025-12-04T09:35:36.5537015Z "digest": "sha256:1be9da2ce53d20d8befad5c024ee0eb41ee35984307cbd5621d8effae0353073" 2025-12-04T09:35:36.5537575Z }, 2025-12-04T09:35:36.5537780Z { 2025-12-04T09:35:36.5538153Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5538632Z "size": 32, 2025-12-04T09:35:36.5539093Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5539647Z }, 2025-12-04T09:35:36.5539860Z { 2025-12-04T09:35:36.5540222Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5540705Z "size": 106, 2025-12-04T09:35:36.5541172Z "digest": "sha256:6481b7a1d9fb4001fd6f9e2a8d1600192529ddb957128e41671ca4630fa06ad4" 2025-12-04T09:35:36.5541717Z }, 2025-12-04T09:35:36.5541920Z { 2025-12-04T09:35:36.5542294Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5542784Z "size": 312293471, 2025-12-04T09:35:36.5543274Z "digest": "sha256:fa519d18c39d8f297109c056017ebce7efc322d058afd27fdac5880d6c8d35b0" 2025-12-04T09:35:36.5543825Z }, 2025-12-04T09:35:36.5544038Z { 2025-12-04T09:35:36.5544400Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5544885Z "size": 3058012325, 2025-12-04T09:35:36.5545485Z "digest": "sha256:d172f25b97f78fce0f6c6701f0db794b1c994a9cdf8cff9ddc6bdd1a1bea835c" 2025-12-04T09:35:36.5546039Z }, 2025-12-04T09:35:36.5546254Z { 2025-12-04T09:35:36.5546633Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5547107Z "size": 129, 2025-12-04T09:35:36.5547582Z "digest": "sha256:fd60ab6b1c2c85a932e9894b5d0cf5c9e75fa21782e3028ea40d76017ecfbf85" 2025-12-04T09:35:36.5548133Z }, 2025-12-04T09:35:36.5548345Z { 2025-12-04T09:35:36.5548704Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5549256Z "size": 880, 2025-12-04T09:35:36.5549730Z "digest": "sha256:0afe45579c2c87002db8c1abf7b32a748e6cb3b9b57e9b391f91cad9f84df476" 2025-12-04T09:35:36.5550271Z }, 2025-12-04T09:35:36.5550481Z { 2025-12-04T09:35:36.5550852Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5551320Z "size": 724, 2025-12-04T09:35:36.5551787Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:36.5552325Z }, 2025-12-04T09:35:36.5552527Z { 2025-12-04T09:35:36.5552899Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5553387Z "size": 139, 2025-12-04T09:35:36.5553854Z "digest": "sha256:5884ffd6720b47274f651262d5f9224f55960f9ea717faafe332aa20afb0ffa4" 2025-12-04T09:35:36.5554385Z }, 2025-12-04T09:35:36.5554608Z { 2025-12-04T09:35:36.5554982Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5555454Z "size": 32, 2025-12-04T09:35:36.5555934Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5556489Z }, 2025-12-04T09:35:36.5556691Z { 2025-12-04T09:35:36.5557064Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5557549Z "size": 160, 2025-12-04T09:35:36.5558020Z "digest": "sha256:ab7a7c316fa7a9b7a96304ce96fafdffbc5cc6b960a4bb2def9131b36d9225c5" 2025-12-04T09:35:36.5558589Z }, 2025-12-04T09:35:36.5558802Z { 2025-12-04T09:35:36.5559160Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5559642Z "size": 1012, 2025-12-04T09:35:36.5560131Z "digest": "sha256:c7775ce5574bdde75b4c09a1db19f7d0dc027f1f4c1f961022fc55833133e616" 2025-12-04T09:35:36.5560685Z }, 2025-12-04T09:35:36.5560889Z { 2025-12-04T09:35:36.5561264Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5561749Z "size": 724, 2025-12-04T09:35:36.5562201Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:36.5562865Z }, 2025-12-04T09:35:36.5563084Z { 2025-12-04T09:35:36.5563451Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5563948Z "size": 134, 2025-12-04T09:35:36.5564427Z "digest": "sha256:81945c4fb228ca73f4bac38b6d8a1eca7139585d4a078219dfaa16ea13945949" 2025-12-04T09:35:36.5564978Z }, 2025-12-04T09:35:36.5565198Z { 2025-12-04T09:35:36.5565581Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5566058Z "size": 32, 2025-12-04T09:35:36.5566538Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5567091Z }, 2025-12-04T09:35:36.5567306Z { 2025-12-04T09:35:36.5567667Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5568146Z "size": 158, 2025-12-04T09:35:36.5568617Z "digest": "sha256:663cbe24d60bf42bc7a440cb4867e4287cacf54194dd3152406668e61d7e92e5" 2025-12-04T09:35:36.5569162Z }, 2025-12-04T09:35:36.5569378Z { 2025-12-04T09:35:36.5569752Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5570219Z "size": 603, 2025-12-04T09:35:36.5570675Z "digest": "sha256:43f216b027865c8ca16f855703465445f3a548614a4d7e29387337b9651ac25c" 2025-12-04T09:35:36.5571206Z }, 2025-12-04T09:35:36.5571405Z { 2025-12-04T09:35:36.5571880Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5572363Z "size": 724, 2025-12-04T09:35:36.5572828Z "digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084" 2025-12-04T09:35:36.5573356Z }, 2025-12-04T09:35:36.5573567Z { 2025-12-04T09:35:36.5573944Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5574411Z "size": 155, 2025-12-04T09:35:36.5574889Z "digest": "sha256:c47c3cfeb68763aa19727693ad52fe0c80561a98139adaa2ab5eccea35c2d1b4" 2025-12-04T09:35:36.5575511Z }, 2025-12-04T09:35:36.5575710Z { 2025-12-04T09:35:36.5576086Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5576570Z "size": 32, 2025-12-04T09:35:36.5577028Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5577583Z }, 2025-12-04T09:35:36.5577796Z { 2025-12-04T09:35:36.5578155Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5578641Z "size": 188, 2025-12-04T09:35:36.5579111Z "digest": "sha256:7d326b9e267322de9337ac2a71ddeac4cb61f28a018a6155863f83a164ad9437" 2025-12-04T09:35:36.5579655Z }, 2025-12-04T09:35:36.5579854Z { 2025-12-04T09:35:36.5580227Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5580714Z "size": 1370, 2025-12-04T09:35:36.5581181Z "digest": "sha256:7ec8f17141c8335192fa21b660dfe1fe0ad16b202bc234e7d4ef063b35124158" 2025-12-04T09:35:36.5581732Z }, 2025-12-04T09:35:36.5581945Z { 2025-12-04T09:35:36.5582315Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5582797Z "size": 32, 2025-12-04T09:35:36.5583267Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5583807Z }, 2025-12-04T09:35:36.5584021Z { 2025-12-04T09:35:36.5584390Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5584857Z "size": 136, 2025-12-04T09:35:36.5585332Z "digest": "sha256:26249ea175bf816b87c4c83e5efb78fd386a800fa10e819ba85b06858bcf877e" 2025-12-04T09:35:36.5585877Z }, 2025-12-04T09:35:36.5586090Z { 2025-12-04T09:35:36.5586454Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5586938Z "size": 529, 2025-12-04T09:35:36.5587408Z "digest": "sha256:5e8e9ccb36f30a8c3a7e6a5011ee5001152f36c9c749397f3e234b1822326dd0" 2025-12-04T09:35:36.5587947Z }, 2025-12-04T09:35:36.5588161Z { 2025-12-04T09:35:36.5588533Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5589007Z "size": 32, 2025-12-04T09:35:36.5589479Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5590033Z }, 2025-12-04T09:35:36.5590231Z { 2025-12-04T09:35:36.5590598Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5591076Z "size": 104, 2025-12-04T09:35:36.5591548Z "digest": "sha256:5bc72d4e1de83a1a254e8808f727118dd54cf048c14ff298a5299e015a116bfd" 2025-12-04T09:35:36.5592083Z }, 2025-12-04T09:35:36.5592297Z { 2025-12-04T09:35:36.5592669Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5593139Z "size": 436, 2025-12-04T09:35:36.5593609Z "digest": "sha256:83cddbd497794c27254e11c4c00105d1f61399e7fef9d208a0be250724efd2c0" 2025-12-04T09:35:36.5594160Z }, 2025-12-04T09:35:36.5594363Z { 2025-12-04T09:35:36.5594740Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5595236Z "size": 32, 2025-12-04T09:35:36.5595697Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5596256Z }, 2025-12-04T09:35:36.5596470Z { 2025-12-04T09:35:36.5596829Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5597320Z "size": 109, 2025-12-04T09:35:36.5613709Z "digest": "sha256:60c25d8c3dd2d78785f659204d0b1e64954ca581f89874b68ffe8fee23c6b661" 2025-12-04T09:35:36.5614292Z }, 2025-12-04T09:35:36.5614518Z { 2025-12-04T09:35:36.5614905Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5615387Z "size": 1896, 2025-12-04T09:35:36.5615883Z "digest": "sha256:a534dcf4b9a9e5fabed742c8a8fc43c9cfe7346ea88ab3c177c3b14fd3afe00a" 2025-12-04T09:35:36.5616451Z }, 2025-12-04T09:35:36.5616657Z { 2025-12-04T09:35:36.5617034Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5617521Z "size": 245582017, 2025-12-04T09:35:36.5618117Z "digest": "sha256:10138310c65c78d7de8375225ce37f5f7bfae7898e4e8bbcb90bd56a1bd05db4" 2025-12-04T09:35:36.5618720Z }, 2025-12-04T09:35:36.5618936Z { 2025-12-04T09:35:36.5619311Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5619775Z "size": 106, 2025-12-04T09:35:36.5620252Z "digest": "sha256:8487679f252b6fb703dc9398d73aaeec68df724bfc961579ec5bdae62ebe3a37" 2025-12-04T09:35:36.5620809Z }, 2025-12-04T09:35:36.5621011Z { 2025-12-04T09:35:36.5621386Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5621867Z "size": 162, 2025-12-04T09:35:36.5622330Z "digest": "sha256:52580ee2caa9ab69b0ac640315ee350e847cd0955c0a1eafa933a076669e87ad" 2025-12-04T09:35:36.5622881Z }, 2025-12-04T09:35:36.5623095Z { 2025-12-04T09:35:36.5623453Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5623935Z "size": 7944, 2025-12-04T09:35:36.5624421Z "digest": "sha256:741c215cb2ffb295ab6a07fab3f0dfdde029463779ff9c0bbff4add26a340cfb" 2025-12-04T09:35:36.5624984Z }, 2025-12-04T09:35:36.5625187Z { 2025-12-04T09:35:36.5625557Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5626038Z "size": 8070, 2025-12-04T09:35:36.5626489Z "digest": "sha256:d17f5aba17a608d1c7851cb3940a25d43f063385813051127074f693d0ede19b" 2025-12-04T09:35:36.5627032Z }, 2025-12-04T09:35:36.5627247Z { 2025-12-04T09:35:36.5627613Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5628098Z "size": 304, 2025-12-04T09:35:36.5628581Z "digest": "sha256:bc08246bb4ba18c3ec5bc69e16b6b4e929c5bd0f3fae10eeb0b1a622a63d6fa2" 2025-12-04T09:35:36.5629133Z }, 2025-12-04T09:35:36.5629346Z { 2025-12-04T09:35:36.5629719Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5630190Z "size": 23755574, 2025-12-04T09:35:36.5630679Z "digest": "sha256:7323bf084bf98f915db061b178c56525a0f95bd34d211b381c7527ad242c5a58" 2025-12-04T09:35:36.5631228Z }, 2025-12-04T09:35:36.5631438Z { 2025-12-04T09:35:36.5631794Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5632274Z "size": 108, 2025-12-04T09:35:36.5632758Z "digest": "sha256:d344ecc97fd77c7d12fd68ddb67aeb6cc3dd2e723de5ad1ca2c80b45c8d6bd77" 2025-12-04T09:35:36.5633310Z }, 2025-12-04T09:35:36.5633522Z { 2025-12-04T09:35:36.5633900Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5634376Z "size": 54145663, 2025-12-04T09:35:36.5634864Z "digest": "sha256:fb60b2d2147ff57c218f449f5b680132af8f7f8032ed69f422b48a3c3c1424f4" 2025-12-04T09:35:36.5635412Z }, 2025-12-04T09:35:36.5635613Z { 2025-12-04T09:35:36.5635984Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T09:35:36.5636463Z "size": 32, 2025-12-04T09:35:36.5636942Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T09:35:36.5637489Z } 2025-12-04T09:35:36.5637706Z ] 2025-12-04T09:35:36.5637919Z } 2025-12-04T09:35:36.5638160Z + exit 0 2025-12-04T09:35:36.5668436Z ##[group]Run set -eux 2025-12-04T09:35:36.5668766Z set -eux 2025-12-04T09:35:36.5669264Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T09:35:36.5670760Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T09:35:36.5679091Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:36.5679530Z env: 2025-12-04T09:35:36.5679783Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:36.5680072Z ##[endgroup] 2025-12-04T09:35:36.5712030Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T09:35:36.5712779Z + jq --raw-output .SecretString 2025-12-04T09:35:36.5714179Z + jq -r .docker_hub_readonly_token 2025-12-04T09:35:36.5715128Z + docker login --username pytorchbot --password-stdin 2025-12-04T09:35:37.2191286Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:37.2192017Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:37.2192691Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:37.2193533Z 2025-12-04T09:35:37.2193778Z Login Succeeded 2025-12-04T09:35:37.2284836Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:35:37.2285280Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T09:35:37.2285752Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T09:35:37.2292535Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:37.2292979Z env: 2025-12-04T09:35:37.2293232Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:37.2294217Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.2295239Z ##[endgroup] 2025-12-04T09:35:37.2325610Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.2376612Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T09:35:37.2377138Z with: 2025-12-04T09:35:37.2378045Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.2379187Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:37.2379648Z env: 2025-12-04T09:35:37.2379892Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:37.2380183Z ##[endgroup] 2025-12-04T09:35:37.2396588Z ##[group]Run set -x 2025-12-04T09:35:37.2396909Z set -x 2025-12-04T09:35:37.2397161Z set +e 2025-12-04T09:35:37.2397420Z  2025-12-04T09:35:37.2397683Z login() { 2025-12-04T09:35:37.2398231Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T09:35:37.2398847Z } 2025-12-04T09:35:37.2399087Z  2025-12-04T09:35:37.2399364Z retry () { 2025-12-04T09:35:37.2399670Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T09:35:37.2400037Z } 2025-12-04T09:35:37.2400273Z  2025-12-04T09:35:37.2400541Z retry login "${DOCKER_REGISTRY}" 2025-12-04T09:35:37.2401127Z  2025-12-04T09:35:37.2401709Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T09:35:37.2402588Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T09:35:37.2403015Z  2025-12-04T09:35:37.2403260Z set -e 2025-12-04T09:35:37.2403661Z # ignore output since only exit code is used for conditional 2025-12-04T09:35:37.2404261Z # only pull docker image if it's not available locally 2025-12-04T09:35:37.2404896Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T09:35:37.2405502Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T09:35:37.2405875Z fi 2025-12-04T09:35:37.2412175Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:35:37.2412621Z env: 2025-12-04T09:35:37.2412869Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:35:37.2413832Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.2414970Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:37.2415426Z ##[endgroup] 2025-12-04T09:35:37.2441727Z + set +e 2025-12-04T09:35:37.2442429Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:37.2443205Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:37.2445817Z + aws ecr get-login-password --region us-east-1 2025-12-04T09:35:37.2447240Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T09:35:37.8551727Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T09:35:37.8552442Z Configure a credential helper to remove this warning. See 2025-12-04T09:35:37.8553461Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T09:35:37.8554074Z 2025-12-04T09:35:37.8554208Z Login Succeeded 2025-12-04T09:35:37.8574384Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:37.8575529Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T09:35:38.0720673Z + IMAGE_SIZE=13438.219573020935 2025-12-04T09:35:38.0721528Z + echo 'Compressed size of image in MB: 13438.219573020935' 2025-12-04T09:35:38.0722018Z + set -e 2025-12-04T09:35:38.0723128Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:38.0724639Z Compressed size of image in MB: 13438.219573020935 2025-12-04T09:35:38.0845114Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:38.3318710Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:35:38.3320950Z pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T09:35:38.3322488Z 63e5bc7682b8: Pulling fs layer 2025-12-04T09:35:38.3322917Z 835841cca3b7: Pulling fs layer 2025-12-04T09:35:38.3323385Z 1bf1bb125dea: Pulling fs layer 2025-12-04T09:35:38.3323733Z b21856d1bf42: Pulling fs layer 2025-12-04T09:35:38.3324297Z 848ba2c095e2: Pulling fs layer 2025-12-04T09:35:38.3324736Z 029495b23122: Pulling fs layer 2025-12-04T09:35:38.3325109Z 073bb82063cf: Pulling fs layer 2025-12-04T09:35:38.3325452Z 59b639308833: Pulling fs layer 2025-12-04T09:35:38.3325770Z 1c6177b2970d: Pulling fs layer 2025-12-04T09:35:38.3326079Z fabe466dd5f3: Pulling fs layer 2025-12-04T09:35:38.3326472Z 2b5a11b41761: Pulling fs layer 2025-12-04T09:35:38.3326907Z 9681563a88ff: Pulling fs layer 2025-12-04T09:35:38.3327229Z dc0780902fca: Pulling fs layer 2025-12-04T09:35:38.3327716Z 5b09a2b135c8: Pulling fs layer 2025-12-04T09:35:38.3328143Z 4f4fb700ef54: Pulling fs layer 2025-12-04T09:35:38.3328492Z 5bfdaeb5578d: Pulling fs layer 2025-12-04T09:35:38.3328930Z 848ba2c095e2: Waiting 2025-12-04T09:35:38.3329276Z 0ef42867f370: Pulling fs layer 2025-12-04T09:35:38.3329662Z 446083e497f3: Pulling fs layer 2025-12-04T09:35:38.3330077Z d8a170bef0f4: Pulling fs layer 2025-12-04T09:35:38.3330472Z e2b6cd6a5bd0: Pulling fs layer 2025-12-04T09:35:38.3330860Z 93efc0181a22: Pulling fs layer 2025-12-04T09:35:38.3331167Z 7454c938f174: Pulling fs layer 2025-12-04T09:35:38.3331553Z 4d57ff55f6d4: Pulling fs layer 2025-12-04T09:35:38.3331873Z 5b09a2b135c8: Waiting 2025-12-04T09:35:38.3332134Z 073bb82063cf: Waiting 2025-12-04T09:35:38.3332406Z 4f4fb700ef54: Waiting 2025-12-04T09:35:38.3332687Z b0301534b4a5: Pulling fs layer 2025-12-04T09:35:38.3332993Z 1969e15d0c13: Pulling fs layer 2025-12-04T09:35:38.3333299Z 446083e497f3: Waiting 2025-12-04T09:35:38.3333582Z 73180a0f2d5a: Pulling fs layer 2025-12-04T09:35:38.3333890Z d8a170bef0f4: Waiting 2025-12-04T09:35:38.3334164Z ad81b25cb69f: Pulling fs layer 2025-12-04T09:35:38.3334479Z 0ef42867f370: Waiting 2025-12-04T09:35:38.3334754Z e2b6cd6a5bd0: Waiting 2025-12-04T09:35:38.3335083Z 8165374f8dcc: Pulling fs layer 2025-12-04T09:35:38.3335380Z 9681563a88ff: Waiting 2025-12-04T09:35:38.3335651Z 93efc0181a22: Waiting 2025-12-04T09:35:38.3336169Z 7779c0bb9be2: Pulling fs layer 2025-12-04T09:35:38.3336470Z b0301534b4a5: Waiting 2025-12-04T09:35:38.3336747Z fabe466dd5f3: Waiting 2025-12-04T09:35:38.3337020Z 1969e15d0c13: Waiting 2025-12-04T09:35:38.3337274Z 73180a0f2d5a: Waiting 2025-12-04T09:35:38.3337553Z 4d57ff55f6d4: Waiting 2025-12-04T09:35:38.3337819Z 2b5a11b41761: Waiting 2025-12-04T09:35:38.3338137Z 4d0a1c027262: Pulling fs layer 2025-12-04T09:35:38.3338526Z b21856d1bf42: Waiting 2025-12-04T09:35:38.3338956Z 8165374f8dcc: Waiting 2025-12-04T09:35:38.3339409Z 7779c0bb9be2: Waiting 2025-12-04T09:35:38.3339917Z a51e0dab2d59: Pulling fs layer 2025-12-04T09:35:38.3340475Z dc0780902fca: Waiting 2025-12-04T09:35:38.3340896Z 7454c938f174: Waiting 2025-12-04T09:35:38.3341211Z 4d0a1c027262: Waiting 2025-12-04T09:35:38.3341486Z a51e0dab2d59: Waiting 2025-12-04T09:35:38.3341778Z 3eb6d4ff040b: Pulling fs layer 2025-12-04T09:35:38.3342079Z 029495b23122: Waiting 2025-12-04T09:35:38.3342361Z b168858b8537: Pulling fs layer 2025-12-04T09:35:38.3342693Z d77a39278026: Pulling fs layer 2025-12-04T09:35:38.3342995Z 5bfdaeb5578d: Waiting 2025-12-04T09:35:38.3343280Z 3eb6d4ff040b: Waiting 2025-12-04T09:35:38.3343573Z 36fbd357280b: Pulling fs layer 2025-12-04T09:35:38.3343872Z d77a39278026: Waiting 2025-12-04T09:35:38.3344336Z b168858b8537: Waiting 2025-12-04T09:35:38.3344625Z 4e3b10a5dd6a: Pulling fs layer 2025-12-04T09:35:38.3344924Z 1c6177b2970d: Waiting 2025-12-04T09:35:38.3345196Z 36fbd357280b: Waiting 2025-12-04T09:35:38.3345485Z 3092fab73b59: Pulling fs layer 2025-12-04T09:35:38.3345808Z 20020dd28a15: Pulling fs layer 2025-12-04T09:35:38.3346119Z ae5280ce969d: Pulling fs layer 2025-12-04T09:35:38.3346429Z 4e3b10a5dd6a: Waiting 2025-12-04T09:35:38.3346702Z 3092fab73b59: Waiting 2025-12-04T09:35:38.3346957Z 20020dd28a15: Waiting 2025-12-04T09:35:38.3347238Z 026e4484b749: Pulling fs layer 2025-12-04T09:35:38.3347542Z ae5280ce969d: Waiting 2025-12-04T09:35:38.3347817Z 1be9da2ce53d: Pulling fs layer 2025-12-04T09:35:38.3348141Z 6481b7a1d9fb: Pulling fs layer 2025-12-04T09:35:38.3348456Z 026e4484b749: Waiting 2025-12-04T09:35:38.3348719Z 1be9da2ce53d: Waiting 2025-12-04T09:35:38.3349002Z fa519d18c39d: Pulling fs layer 2025-12-04T09:35:38.3349317Z 6481b7a1d9fb: Waiting 2025-12-04T09:35:38.3349595Z d172f25b97f7: Pulling fs layer 2025-12-04T09:35:38.3349919Z fd60ab6b1c2c: Pulling fs layer 2025-12-04T09:35:38.3350226Z fa519d18c39d: Waiting 2025-12-04T09:35:38.3350481Z d172f25b97f7: Waiting 2025-12-04T09:35:38.3350764Z 0afe45579c2c: Pulling fs layer 2025-12-04T09:35:38.3351075Z fd60ab6b1c2c: Waiting 2025-12-04T09:35:38.3351347Z 5884ffd6720b: Pulling fs layer 2025-12-04T09:35:38.3351653Z 0afe45579c2c: Waiting 2025-12-04T09:35:38.3351938Z ab7a7c316fa7: Pulling fs layer 2025-12-04T09:35:38.3352244Z 5884ffd6720b: Waiting 2025-12-04T09:35:38.3352508Z c7775ce5574b: Pulling fs layer 2025-12-04T09:35:38.3352812Z c7775ce5574b: Waiting 2025-12-04T09:35:38.3353090Z 81945c4fb228: Pulling fs layer 2025-12-04T09:35:38.3353387Z ab7a7c316fa7: Waiting 2025-12-04T09:35:38.3353679Z 663cbe24d60b: Pulling fs layer 2025-12-04T09:35:38.3354002Z 43f216b02786: Pulling fs layer 2025-12-04T09:35:38.3354294Z 81945c4fb228: Waiting 2025-12-04T09:35:38.3354568Z 43f216b02786: Waiting 2025-12-04T09:35:38.3354850Z c47c3cfeb687: Pulling fs layer 2025-12-04T09:35:38.3355153Z 663cbe24d60b: Waiting 2025-12-04T09:35:38.3355434Z 7d326b9e2673: Pulling fs layer 2025-12-04T09:35:38.3355745Z c47c3cfeb687: Waiting 2025-12-04T09:35:38.3356015Z 7ec8f17141c8: Pulling fs layer 2025-12-04T09:35:38.3356321Z 7d326b9e2673: Waiting 2025-12-04T09:35:38.3356600Z 26249ea175bf: Pulling fs layer 2025-12-04T09:35:38.3356907Z 5e8e9ccb36f3: Pulling fs layer 2025-12-04T09:35:38.3357218Z 7ec8f17141c8: Waiting 2025-12-04T09:35:38.3357488Z 26249ea175bf: Waiting 2025-12-04T09:35:38.3357763Z 5bc72d4e1de8: Pulling fs layer 2025-12-04T09:35:38.3358087Z 83cddbd49779: Pulling fs layer 2025-12-04T09:35:38.3358414Z 60c25d8c3dd2: Pulling fs layer 2025-12-04T09:35:38.3358738Z a534dcf4b9a9: Pulling fs layer 2025-12-04T09:35:38.3359158Z 5bc72d4e1de8: Waiting 2025-12-04T09:35:38.3359441Z 10138310c65c: Pulling fs layer 2025-12-04T09:35:38.3359810Z 60c25d8c3dd2: Waiting 2025-12-04T09:35:38.3360132Z 83cddbd49779: Waiting 2025-12-04T09:35:38.3360413Z 8487679f252b: Pulling fs layer 2025-12-04T09:35:38.3360726Z a534dcf4b9a9: Waiting 2025-12-04T09:35:38.3360995Z 52580ee2caa9: Pulling fs layer 2025-12-04T09:35:38.3361386Z 741c215cb2ff: Pulling fs layer 2025-12-04T09:35:38.3361923Z d17f5aba17a6: Pulling fs layer 2025-12-04T09:35:38.3362542Z 10138310c65c: Waiting 2025-12-04T09:35:38.3362982Z 52580ee2caa9: Waiting 2025-12-04T09:35:38.3363468Z bc08246bb4ba: Pulling fs layer 2025-12-04T09:35:38.3363971Z d17f5aba17a6: Waiting 2025-12-04T09:35:38.3364438Z 741c215cb2ff: Waiting 2025-12-04T09:35:38.3364916Z 7323bf084bf9: Pulling fs layer 2025-12-04T09:35:38.3365424Z bc08246bb4ba: Waiting 2025-12-04T09:35:38.3365869Z d344ecc97fd7: Pulling fs layer 2025-12-04T09:35:38.3366244Z fb60b2d2147f: Pulling fs layer 2025-12-04T09:35:38.3366555Z 7323bf084bf9: Waiting 2025-12-04T09:35:38.3366883Z d344ecc97fd7: Waiting 2025-12-04T09:35:38.3367149Z 8487679f252b: Waiting 2025-12-04T09:35:38.3367476Z fb60b2d2147f: Waiting 2025-12-04T09:35:38.4026485Z 835841cca3b7: Download complete 2025-12-04T09:35:38.4773573Z b21856d1bf42: Verifying Checksum 2025-12-04T09:35:38.4774017Z b21856d1bf42: Download complete 2025-12-04T09:35:38.5656352Z 848ba2c095e2: Download complete 2025-12-04T09:35:38.6437962Z 029495b23122: Download complete 2025-12-04T09:35:38.6859973Z 63e5bc7682b8: Verifying Checksum 2025-12-04T09:35:38.6860573Z 63e5bc7682b8: Download complete 2025-12-04T09:35:38.7601251Z 59b639308833: Download complete 2025-12-04T09:35:38.7729943Z 073bb82063cf: Verifying Checksum 2025-12-04T09:35:38.8775829Z 073bb82063cf: Download complete 2025-12-04T09:35:38.8776244Z fabe466dd5f3: Download complete 2025-12-04T09:35:38.9504333Z 2b5a11b41761: Download complete 2025-12-04T09:35:39.0219868Z 9681563a88ff: Verifying Checksum 2025-12-04T09:35:39.0220306Z 9681563a88ff: Download complete 2025-12-04T09:35:39.0952039Z dc0780902fca: Download complete 2025-12-04T09:35:39.6988056Z 63e5bc7682b8: Pull complete 2025-12-04T09:35:39.7240595Z 835841cca3b7: Pull complete 2025-12-04T09:35:39.9351872Z 1c6177b2970d: Verifying Checksum 2025-12-04T09:35:39.9352330Z 1c6177b2970d: Download complete 2025-12-04T09:35:40.0217585Z 5bfdaeb5578d: Verifying Checksum 2025-12-04T09:35:40.0218027Z 5bfdaeb5578d: Download complete 2025-12-04T09:35:40.1169938Z 0ef42867f370: Download complete 2025-12-04T09:35:40.1892097Z 446083e497f3: Verifying Checksum 2025-12-04T09:35:40.1892759Z 446083e497f3: Download complete 2025-12-04T09:35:40.3014746Z d8a170bef0f4: Verifying Checksum 2025-12-04T09:35:40.3015198Z d8a170bef0f4: Download complete 2025-12-04T09:35:40.3633398Z e2b6cd6a5bd0: Download complete 2025-12-04T09:35:40.4368059Z 93efc0181a22: Verifying Checksum 2025-12-04T09:35:40.4368512Z 93efc0181a22: Download complete 2025-12-04T09:35:40.5291129Z 7454c938f174: Verifying Checksum 2025-12-04T09:35:40.5291546Z 7454c938f174: Download complete 2025-12-04T09:35:40.6109950Z 4d57ff55f6d4: Download complete 2025-12-04T09:35:41.5176513Z 1bf1bb125dea: Verifying Checksum 2025-12-04T09:35:41.5176955Z 1bf1bb125dea: Download complete 2025-12-04T09:35:41.6069828Z 1969e15d0c13: Verifying Checksum 2025-12-04T09:35:41.6070514Z 1969e15d0c13: Download complete 2025-12-04T09:35:41.9858347Z 73180a0f2d5a: Verifying Checksum 2025-12-04T09:35:41.9859052Z 73180a0f2d5a: Download complete 2025-12-04T09:35:42.0756489Z ad81b25cb69f: Verifying Checksum 2025-12-04T09:35:42.0757180Z ad81b25cb69f: Download complete 2025-12-04T09:35:42.1697434Z 8165374f8dcc: Verifying Checksum 2025-12-04T09:35:42.1698077Z 8165374f8dcc: Download complete 2025-12-04T09:35:49.3074024Z 7779c0bb9be2: Verifying Checksum 2025-12-04T09:35:49.3074458Z 7779c0bb9be2: Download complete 2025-12-04T09:35:49.3855256Z 4d0a1c027262: Verifying Checksum 2025-12-04T09:35:49.3855692Z 4d0a1c027262: Download complete 2025-12-04T09:35:49.4753820Z a51e0dab2d59: Verifying Checksum 2025-12-04T09:35:49.4754376Z a51e0dab2d59: Download complete 2025-12-04T09:35:49.5650574Z 3eb6d4ff040b: Verifying Checksum 2025-12-04T09:35:49.5651006Z 3eb6d4ff040b: Download complete 2025-12-04T09:35:49.6447329Z b168858b8537: Verifying Checksum 2025-12-04T09:35:49.6447759Z b168858b8537: Download complete 2025-12-04T09:35:50.1389068Z d77a39278026: Verifying Checksum 2025-12-04T09:35:50.1389500Z d77a39278026: Download complete 2025-12-04T09:35:50.2654951Z 36fbd357280b: Verifying Checksum 2025-12-04T09:35:50.2655377Z 36fbd357280b: Download complete 2025-12-04T09:35:50.3542715Z 4e3b10a5dd6a: Verifying Checksum 2025-12-04T09:35:50.3543102Z 4e3b10a5dd6a: Download complete 2025-12-04T09:35:50.3777890Z 1bf1bb125dea: Pull complete 2025-12-04T09:35:50.4539598Z 3092fab73b59: Verifying Checksum 2025-12-04T09:35:50.4540033Z 3092fab73b59: Download complete 2025-12-04T09:35:50.5453564Z 20020dd28a15: Verifying Checksum 2025-12-04T09:35:50.5453977Z 20020dd28a15: Download complete 2025-12-04T09:35:50.5875509Z b21856d1bf42: Pull complete 2025-12-04T09:35:50.6047991Z ae5280ce969d: Verifying Checksum 2025-12-04T09:35:50.6048369Z ae5280ce969d: Download complete 2025-12-04T09:35:50.7028839Z 026e4484b749: Verifying Checksum 2025-12-04T09:35:50.7029271Z 026e4484b749: Download complete 2025-12-04T09:35:50.7777399Z 848ba2c095e2: Pull complete 2025-12-04T09:35:50.8004378Z 1be9da2ce53d: Verifying Checksum 2025-12-04T09:35:50.8004758Z 1be9da2ce53d: Download complete 2025-12-04T09:35:50.8748049Z 6481b7a1d9fb: Verifying Checksum 2025-12-04T09:35:50.8748450Z 6481b7a1d9fb: Download complete 2025-12-04T09:35:50.9925424Z 029495b23122: Pull complete 2025-12-04T09:35:51.1589274Z 073bb82063cf: Pull complete 2025-12-04T09:35:51.2672407Z 59b639308833: Pull complete 2025-12-04T09:35:53.9234675Z 1c6177b2970d: Pull complete 2025-12-04T09:35:54.1438146Z fabe466dd5f3: Pull complete 2025-12-04T09:35:54.3707497Z 2b5a11b41761: Pull complete 2025-12-04T09:35:54.5943671Z 9681563a88ff: Pull complete 2025-12-04T09:35:54.8093076Z dc0780902fca: Pull complete 2025-12-04T09:35:55.7681823Z fa519d18c39d: Verifying Checksum 2025-12-04T09:35:55.7682538Z fa519d18c39d: Download complete 2025-12-04T09:36:25.4928213Z 5b09a2b135c8: Download complete 2025-12-04T09:36:25.5833939Z fd60ab6b1c2c: Download complete 2025-12-04T09:36:25.6823415Z 0afe45579c2c: Download complete 2025-12-04T09:36:25.7457950Z 5884ffd6720b: Verifying Checksum 2025-12-04T09:36:25.7460813Z 5884ffd6720b: Download complete 2025-12-04T09:36:25.8568706Z ab7a7c316fa7: Download complete 2025-12-04T09:36:25.9644531Z c7775ce5574b: Verifying Checksum 2025-12-04T09:36:25.9645063Z c7775ce5574b: Download complete 2025-12-04T09:36:26.0692352Z 81945c4fb228: Verifying Checksum 2025-12-04T09:36:26.0692933Z 81945c4fb228: Download complete 2025-12-04T09:36:26.1756113Z 663cbe24d60b: Verifying Checksum 2025-12-04T09:36:26.1756563Z 663cbe24d60b: Download complete 2025-12-04T09:36:26.2658065Z 43f216b02786: Verifying Checksum 2025-12-04T09:36:26.2658744Z 43f216b02786: Download complete 2025-12-04T09:36:26.3573525Z c47c3cfeb687: Verifying Checksum 2025-12-04T09:36:26.3573943Z c47c3cfeb687: Download complete 2025-12-04T09:36:26.4407766Z 7d326b9e2673: Download complete 2025-12-04T09:36:26.5331117Z 7ec8f17141c8: Verifying Checksum 2025-12-04T09:36:26.5331697Z 7ec8f17141c8: Download complete 2025-12-04T09:36:26.6032682Z 26249ea175bf: Verifying Checksum 2025-12-04T09:36:26.6033362Z 26249ea175bf: Download complete 2025-12-04T09:36:26.6854862Z 5e8e9ccb36f3: Verifying Checksum 2025-12-04T09:36:26.6855574Z 5e8e9ccb36f3: Download complete 2025-12-04T09:36:26.7808976Z 5bc72d4e1de8: Verifying Checksum 2025-12-04T09:36:26.7809693Z 5bc72d4e1de8: Download complete 2025-12-04T09:36:26.8807716Z 83cddbd49779: Verifying Checksum 2025-12-04T09:36:26.8808139Z 83cddbd49779: Download complete 2025-12-04T09:36:26.9807695Z 60c25d8c3dd2: Download complete 2025-12-04T09:36:27.0573453Z a534dcf4b9a9: Verifying Checksum 2025-12-04T09:36:27.0573859Z a534dcf4b9a9: Download complete 2025-12-04T09:36:30.6460855Z 10138310c65c: Verifying Checksum 2025-12-04T09:36:30.6461283Z 10138310c65c: Download complete 2025-12-04T09:36:30.7251536Z 8487679f252b: Verifying Checksum 2025-12-04T09:36:30.7251948Z 8487679f252b: Download complete 2025-12-04T09:36:30.8172006Z 52580ee2caa9: Download complete 2025-12-04T09:36:30.9446978Z 741c215cb2ff: Download complete 2025-12-04T09:36:31.0497519Z d17f5aba17a6: Verifying Checksum 2025-12-04T09:36:31.0498180Z d17f5aba17a6: Download complete 2025-12-04T09:36:31.1168580Z bc08246bb4ba: Download complete 2025-12-04T09:36:31.5538843Z 7323bf084bf9: Verifying Checksum 2025-12-04T09:36:31.5539456Z 7323bf084bf9: Download complete 2025-12-04T09:36:31.6505785Z d344ecc97fd7: Verifying Checksum 2025-12-04T09:36:31.6506220Z d344ecc97fd7: Download complete 2025-12-04T09:36:32.6191596Z fb60b2d2147f: Download complete 2025-12-04T09:36:46.7157719Z d172f25b97f7: Verifying Checksum 2025-12-04T09:36:46.7158156Z d172f25b97f7: Download complete 2025-12-04T09:37:17.5410826Z 5b09a2b135c8: Pull complete 2025-12-04T09:37:17.7544471Z 4f4fb700ef54: Pull complete 2025-12-04T09:37:17.9689668Z 5bfdaeb5578d: Pull complete 2025-12-04T09:37:18.2261689Z 0ef42867f370: Pull complete 2025-12-04T09:37:18.4542298Z 446083e497f3: Pull complete 2025-12-04T09:37:18.7282366Z d8a170bef0f4: Pull complete 2025-12-04T09:37:18.9445716Z e2b6cd6a5bd0: Pull complete 2025-12-04T09:37:19.1673229Z 93efc0181a22: Pull complete 2025-12-04T09:37:19.3875083Z 7454c938f174: Pull complete 2025-12-04T09:37:19.6060104Z 4d57ff55f6d4: Pull complete 2025-12-04T09:37:20.5723932Z b0301534b4a5: Verifying Checksum 2025-12-04T09:37:20.5724372Z b0301534b4a5: Download complete 2025-12-04T09:38:36.8002535Z b0301534b4a5: Pull complete 2025-12-04T09:38:37.0149409Z 1969e15d0c13: Pull complete 2025-12-04T09:38:37.8159429Z 73180a0f2d5a: Pull complete 2025-12-04T09:38:38.0375045Z ad81b25cb69f: Pull complete 2025-12-04T09:38:38.2693904Z 8165374f8dcc: Pull complete 2025-12-04T09:38:46.3434348Z 7779c0bb9be2: Pull complete 2025-12-04T09:38:46.5642926Z 4d0a1c027262: Pull complete 2025-12-04T09:38:46.7895181Z a51e0dab2d59: Pull complete 2025-12-04T09:38:47.1166699Z 3eb6d4ff040b: Pull complete 2025-12-04T09:38:47.2914514Z b168858b8537: Pull complete 2025-12-04T09:38:47.7406934Z d77a39278026: Pull complete 2025-12-04T09:38:47.9656984Z 36fbd357280b: Pull complete 2025-12-04T09:38:48.1737088Z 4e3b10a5dd6a: Pull complete 2025-12-04T09:38:48.5776841Z 3092fab73b59: Pull complete 2025-12-04T09:38:48.7914997Z 20020dd28a15: Pull complete 2025-12-04T09:38:49.0219358Z ae5280ce969d: Pull complete 2025-12-04T09:38:49.4135588Z 026e4484b749: Pull complete 2025-12-04T09:38:49.6383296Z 1be9da2ce53d: Pull complete 2025-12-04T09:38:50.0297281Z 6481b7a1d9fb: Pull complete 2025-12-04T09:38:51.8398427Z fa519d18c39d: Pull complete 2025-12-04T09:39:51.5372647Z d172f25b97f7: Pull complete 2025-12-04T09:39:51.6565338Z fd60ab6b1c2c: Pull complete 2025-12-04T09:39:51.7640957Z 0afe45579c2c: Pull complete 2025-12-04T09:39:51.9945632Z 5884ffd6720b: Pull complete 2025-12-04T09:39:52.2521721Z ab7a7c316fa7: Pull complete 2025-12-04T09:39:52.4610155Z c7775ce5574b: Pull complete 2025-12-04T09:39:52.6448674Z 81945c4fb228: Pull complete 2025-12-04T09:39:52.7132800Z 663cbe24d60b: Pull complete 2025-12-04T09:39:52.7481517Z 43f216b02786: Pull complete 2025-12-04T09:39:52.8188782Z c47c3cfeb687: Pull complete 2025-12-04T09:39:52.8825677Z 7d326b9e2673: Pull complete 2025-12-04T09:39:52.9131833Z 7ec8f17141c8: Pull complete 2025-12-04T09:39:52.9780199Z 26249ea175bf: Pull complete 2025-12-04T09:39:53.0145600Z 5e8e9ccb36f3: Pull complete 2025-12-04T09:39:53.0838749Z 5bc72d4e1de8: Pull complete 2025-12-04T09:39:53.1184437Z 83cddbd49779: Pull complete 2025-12-04T09:39:53.1827405Z 60c25d8c3dd2: Pull complete 2025-12-04T09:39:53.2064660Z a534dcf4b9a9: Pull complete 2025-12-04T09:39:59.9416997Z 10138310c65c: Pull complete 2025-12-04T09:40:00.1087683Z 8487679f252b: Pull complete 2025-12-04T09:40:00.3427433Z 52580ee2caa9: Pull complete 2025-12-04T09:40:00.4696300Z 741c215cb2ff: Pull complete 2025-12-04T09:40:00.5940866Z d17f5aba17a6: Pull complete 2025-12-04T09:40:00.7370839Z bc08246bb4ba: Pull complete 2025-12-04T09:40:02.2096628Z 7323bf084bf9: Pull complete 2025-12-04T09:40:02.4209394Z d344ecc97fd7: Pull complete 2025-12-04T09:40:04.3803409Z fb60b2d2147f: Pull complete 2025-12-04T09:40:04.5721823Z Digest: sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940 2025-12-04T09:40:04.5835755Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:40:04.5867092Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:40:04.5925927Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.5927098Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.5935194Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:04.5935659Z env: 2025-12-04T09:40:04.5935897Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:04.5936201Z ##[endgroup] 2025-12-04T09:40:04.6144505Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:40:04.6145010Z with: 2025-12-04T09:40:04.6145266Z driver-version: 525.105.17 2025-12-04T09:40:04.6145567Z env: 2025-12-04T09:40:04.6145795Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:04.6146096Z ##[endgroup] 2025-12-04T09:40:04.6169156Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.6170259Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.6177931Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:04.6178377Z env: 2025-12-04T09:40:04.6178630Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:04.6178938Z ##[endgroup] 2025-12-04T09:40:04.6238232Z ##[group]Run set -euo pipefail 2025-12-04T09:40:04.6238616Z set -euo pipefail 2025-12-04T09:40:04.6238970Z  2025-12-04T09:40:04.6239204Z has_gpu=false 2025-12-04T09:40:04.6239497Z devices="" 2025-12-04T09:40:04.6239764Z  2025-12-04T09:40:04.6240070Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:40:04.6240602Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:04.6241332Z  has_gpu=true 2025-12-04T09:40:04.6241684Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:04.6242054Z  fi 2025-12-04T09:40:04.6242381Z fi 2025-12-04T09:40:04.6242623Z  2025-12-04T09:40:04.6242875Z if [ "$has_gpu" = false ]; then 2025-12-04T09:40:04.6243345Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:04.6243807Z  has_gpu=true 2025-12-04T09:40:04.6244167Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:04.6244527Z  fi 2025-12-04T09:40:04.6244771Z fi 2025-12-04T09:40:04.6245053Z  2025-12-04T09:40:04.6245405Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:40:04.6246015Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:40:04.6246508Z  has_gpu=true 2025-12-04T09:40:04.6246857Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:40:04.6247215Z  fi 2025-12-04T09:40:04.6247462Z fi 2025-12-04T09:40:04.6247702Z  2025-12-04T09:40:04.6248047Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.6248689Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:40:04.6255228Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:04.6255662Z env: 2025-12-04T09:40:04.6256065Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:04.6256370Z ##[endgroup] 2025-12-04T09:40:06.1634160Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:40:06.1634647Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:40:06.1635094Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:40:06.1635701Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:40:06.1636255Z else 2025-12-04T09:40:06.1636585Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:40:06.1636990Z fi 2025-12-04T09:40:06.1644327Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:40:06.1644775Z env: 2025-12-04T09:40:06.1645025Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:06.1645311Z HAS_NVIDIA: true 2025-12-04T09:40:06.1645575Z ##[endgroup] 2025-12-04T09:40:06.1724654Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:40:06.1725178Z with: 2025-12-04T09:40:06.1725416Z timeout_minutes: 10 2025-12-04T09:40:06.1725701Z max_attempts: 3 2025-12-04T09:40:06.1758868Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:40:06.1792717Z retry_wait_seconds: 10 2025-12-04T09:40:06.1793047Z polling_interval_seconds: 1 2025-12-04T09:40:06.1793372Z warning_on_retry: true 2025-12-04T09:40:06.1793693Z continue_on_error: false 2025-12-04T09:40:06.1793995Z env: 2025-12-04T09:40:06.1794220Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:40:06.1794523Z HAS_NVIDIA_GPU: true 2025-12-04T09:40:06.1794886Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:40:06.1795304Z DRIVER_VERSION: 525.105.17 2025-12-04T09:40:06.1795605Z ##[endgroup] 2025-12-04T09:40:06.2980502Z == Installing nvidia driver NVIDIA-Linux-x86_64-525.105.17.run == 2025-12-04T09:40:06.2981675Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:40:06.2982084Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:40:06.8903296Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:40:06.8903795Z No packages marked for removal. 2025-12-04T09:40:06.8976909Z Dependencies resolved. 2025-12-04T09:40:06.8987758Z Nothing to do. 2025-12-04T09:40:06.8989434Z Complete! 2025-12-04T09:40:06.9357966Z + install_nvidia_driver_common 2025-12-04T09:40:06.9361505Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:40:06.9361899Z Before installing NVIDIA driver 2025-12-04T09:40:06.9363306Z + lspci 2025-12-04T09:40:06.9552354Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:40:06.9552990Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:40:06.9553681Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:40:06.9554358Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:40:06.9554973Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:40:06.9555653Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:40:06.9556274Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:40:06.9556908Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:40:06.9557426Z + lsmod 2025-12-04T09:40:06.9597072Z Module Size Used by 2025-12-04T09:40:06.9597671Z nvidia_uvm 1925120 0 2025-12-04T09:40:06.9598033Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:40:06.9598397Z drm 602112 1 nvidia 2025-12-04T09:40:06.9598765Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:40:06.9599157Z backlight 24576 1 drm 2025-12-04T09:40:06.9599519Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:40:06.9599867Z xt_conntrack 16384 1 2025-12-04T09:40:06.9600190Z nft_chain_nat 16384 3 2025-12-04T09:40:06.9600510Z xt_MASQUERADE 20480 1 2025-12-04T09:40:06.9601019Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:40:06.9601442Z nf_conntrack_netlink 57344 0 2025-12-04T09:40:06.9601941Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:40:06.9602572Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:40:06.9602950Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:40:06.9603324Z xfrm_user 57344 1 2025-12-04T09:40:06.9603657Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:40:06.9604002Z xt_addrtype 16384 2 2025-12-04T09:40:06.9604324Z nft_compat 20480 4 2025-12-04T09:40:06.9604703Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:40:06.9605213Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:40:06.9605682Z br_netfilter 36864 0 2025-12-04T09:40:06.9606021Z bridge 323584 1 br_netfilter 2025-12-04T09:40:06.9606395Z stp 16384 1 bridge 2025-12-04T09:40:06.9606730Z llc 16384 2 bridge,stp 2025-12-04T09:40:06.9607079Z overlay 167936 0 2025-12-04T09:40:06.9607400Z tls 139264 0 2025-12-04T09:40:06.9607700Z nls_ascii 16384 1 2025-12-04T09:40:06.9608019Z nls_cp437 20480 1 2025-12-04T09:40:06.9608328Z vfat 24576 1 2025-12-04T09:40:06.9608629Z fat 86016 1 vfat 2025-12-04T09:40:06.9608965Z sunrpc 700416 1 2025-12-04T09:40:06.9609269Z i8042 45056 0 2025-12-04T09:40:06.9609561Z ena 184320 0 2025-12-04T09:40:06.9609874Z skx_edac_common 28672 0 2025-12-04T09:40:06.9610195Z serio 28672 3 i8042 2025-12-04T09:40:06.9610540Z ghash_clmulni_intel 16384 0 2025-12-04T09:40:06.9610845Z button 24576 0 2025-12-04T09:40:06.9611156Z sch_fq_codel 20480 17 2025-12-04T09:40:06.9611474Z dm_mod 188416 0 2025-12-04T09:40:06.9611769Z fuse 184320 1 2025-12-04T09:40:06.9612077Z configfs 57344 1 2025-12-04T09:40:06.9612407Z loop 36864 0 2025-12-04T09:40:06.9612707Z dmi_sysfs 20480 0 2025-12-04T09:40:06.9613206Z crc32_pclmul 16384 0 2025-12-04T09:40:06.9613521Z crc32c_intel 24576 0 2025-12-04T09:40:06.9613826Z efivarfs 24576 1 2025-12-04T09:40:06.9614142Z + modinfo nvidia 2025-12-04T09:40:06.9616447Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:40:06.9617019Z import_ns: DMA_BUF 2025-12-04T09:40:06.9617309Z alias: char-major-195-* 2025-12-04T09:40:06.9617641Z version: 580.82.07 2025-12-04T09:40:06.9617950Z supported: external 2025-12-04T09:40:06.9618243Z license: Dual MIT/GPL 2025-12-04T09:40:06.9618605Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:40:06.9619028Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:40:06.9619415Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:40:06.9619828Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:40:06.9620266Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:40:06.9620690Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:40:06.9621126Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:40:06.9621724Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:40:06.9622154Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:40:06.9622555Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:40:06.9622944Z depends: i2c-core,drm 2025-12-04T09:40:06.9623260Z retpoline: Y 2025-12-04T09:40:06.9623513Z name: nvidia 2025-12-04T09:40:06.9623965Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:40:06.9624561Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:40:06.9625118Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:40:06.9625634Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:40:06.9626020Z parm: NVreg_RmLogonRC:int 2025-12-04T09:40:06.9626389Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:40:06.9626772Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:40:06.9627149Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:40:06.9627531Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:40:06.9627965Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:40:06.9628448Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:40:06.9628864Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:40:06.9629245Z parm: NVreg_EnableMSI:int 2025-12-04T09:40:06.9629611Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:40:06.9630057Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:40:06.9630544Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:40:06.9631003Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:40:06.9631505Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:40:06.9632013Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:40:06.9632520Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:40:06.9633026Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:40:06.9633447Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:40:06.9633907Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:40:06.9634357Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:40:06.9634779Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:40:06.9635184Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:40:06.9635580Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:40:06.9635981Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:40:06.9636370Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:40:06.9636785Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:40:06.9637231Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:40:06.9637673Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:40:06.9638215Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:40:06.9638618Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:40:06.9639061Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:40:06.9639502Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:40:06.9639917Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:40:06.9640342Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:40:06.9640754Z parm: NVreg_RmMsg:charp 2025-12-04T09:40:06.9641099Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:40:06.9641503Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:40:06.9641908Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:40:06.9642383Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:40:06.9642795Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:40:06.9643236Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:40:06.9643671Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:40:06.9644068Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:40:06.9644494Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:40:06.9644987Z parm: rm_firmware_active:charp 2025-12-04T09:40:06.9645337Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:40:06.9645638Z ++ command -v nvidia-smi 2025-12-04T09:40:06.9645955Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:40:06.9646260Z + set +e 2025-12-04T09:40:06.9646641Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:40:08.4854083Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:40:08.4854512Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:40:08.4854819Z + '[' 0 -ne 0 ']' 2025-12-04T09:40:08.4855085Z + '[' 580.82.07 '!=' 525.105.17 ']' 2025-12-04T09:40:08.4855690Z + echo 'NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing' 2025-12-04T09:40:08.4856355Z + sudo killall nvidia-persistenced 2025-12-04T09:40:08.4856947Z NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing 2025-12-04T09:40:08.6337980Z nvidia-persistenced: no process found 2025-12-04T09:40:08.6356102Z + true 2025-12-04T09:40:08.6356386Z + set -e 2025-12-04T09:40:08.6356811Z + '[' 0 -eq 0 ']' 2025-12-04T09:40:08.6357078Z + '[' amzn2023 '!=' ubuntu20.04 ']' 2025-12-04T09:40:08.6357478Z + sudo yum groupinstall -y 'Development Tools' 2025-12-04T09:40:09.1498352Z Last metadata expiration check: 0:22:20 ago on Thu Dec 4 09:17:49 2025. 2025-12-04T09:40:09.1943778Z No match for group package "system-rpm-config" 2025-12-04T09:40:09.1964050Z No match for group package "rcs" 2025-12-04T09:40:09.1990305Z No match for group package "pkgconfig" 2025-12-04T09:40:09.2577623Z Dependencies resolved. 2025-12-04T09:40:09.2916783Z ================================================================================ 2025-12-04T09:40:09.2917365Z Package Architecture Version Repository Size 2025-12-04T09:40:09.2917900Z ================================================================================ 2025-12-04T09:40:09.2918319Z Installing Groups: 2025-12-04T09:40:09.2918707Z Development Tools 2025-12-04T09:40:09.2919071Z 2025-12-04T09:40:09.2919178Z Transaction Summary 2025-12-04T09:40:09.2919483Z ================================================================================ 2025-12-04T09:40:09.2919758Z 2025-12-04T09:40:09.5071444Z ================================================================================ 2025-12-04T09:40:09.5071918Z WARNING: 2025-12-04T09:40:09.5072225Z A newer release of "Amazon Linux" is available. 2025-12-04T09:40:09.5072516Z 2025-12-04T09:40:09.5072625Z Available Versions: 2025-12-04T09:40:09.5072819Z 2025-12-04T09:40:09.5072934Z Version 2023.9.20250929: 2025-12-04T09:40:09.5073328Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:40:09.5073652Z 2025-12-04T09:40:09.5073818Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:40:09.5074325Z 2025-12-04T09:40:09.5074427Z Release notes: 2025-12-04T09:40:09.5074963Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:40:09.5075441Z 2025-12-04T09:40:09.5075563Z Version 2023.9.20251014: 2025-12-04T09:40:09.5075954Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:40:09.5076276Z 2025-12-04T09:40:09.5076414Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:40:09.5076698Z 2025-12-04T09:40:09.5076795Z Release notes: 2025-12-04T09:40:09.5077292Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:40:09.5077762Z 2025-12-04T09:40:09.5077867Z Version 2023.9.20251020: 2025-12-04T09:40:09.5078249Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:40:09.5078578Z 2025-12-04T09:40:09.5078714Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:40:09.5078970Z 2025-12-04T09:40:09.5079084Z Release notes: 2025-12-04T09:40:09.5079565Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:40:09.5080043Z 2025-12-04T09:40:09.5080278Z Version 2023.9.20251027: 2025-12-04T09:40:09.5080667Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:40:09.5080983Z 2025-12-04T09:40:09.5081132Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:40:09.5081390Z 2025-12-04T09:40:09.5081488Z Release notes: 2025-12-04T09:40:09.5081985Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:40:09.5082527Z 2025-12-04T09:40:09.5082645Z Version 2023.9.20251105: 2025-12-04T09:40:09.5083012Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:40:09.5083345Z 2025-12-04T09:40:09.5083482Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:40:09.5083759Z 2025-12-04T09:40:09.5083859Z Release notes: 2025-12-04T09:40:09.5084351Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:40:09.5084824Z 2025-12-04T09:40:09.5084929Z Version 2023.9.20251110: 2025-12-04T09:40:09.5085319Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:40:09.5085638Z 2025-12-04T09:40:09.5085787Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:40:09.5086048Z 2025-12-04T09:40:09.5086149Z Release notes: 2025-12-04T09:40:09.5086641Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:40:09.5087119Z 2025-12-04T09:40:09.5087223Z Version 2023.9.20251117: 2025-12-04T09:40:09.5087602Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:40:09.5087916Z 2025-12-04T09:40:09.5088050Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:40:09.5088321Z 2025-12-04T09:40:09.5088420Z Release notes: 2025-12-04T09:40:09.5088911Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:40:09.5089384Z 2025-12-04T09:40:09.5089527Z ================================================================================ 2025-12-04T09:40:09.5089903Z Complete! 2025-12-04T09:40:09.5539006Z ++ uname -r 2025-12-04T09:40:09.5549528Z + sudo yum install -y 'kernel-devel-uname-r == 6.1.150-174.273.amzn2023.x86_64' 2025-12-04T09:40:10.0989915Z Last metadata expiration check: 0:22:21 ago on Thu Dec 4 09:17:49 2025. 2025-12-04T09:40:10.1294192Z Using '==' operator in reldeps can result in an undefined behavior. It is deprecated and the support will be dropped in future versions. Use '=' operator instead. 2025-12-04T09:40:10.1418781Z Package kernel-devel-1:6.1.150-174.273.amzn2023.x86_64 is already installed. 2025-12-04T09:40:10.2036199Z Dependencies resolved. 2025-12-04T09:40:10.2371316Z Nothing to do. 2025-12-04T09:40:10.2372018Z Complete! 2025-12-04T09:40:10.2795064Z + sudo modprobe backlight 2025-12-04T09:40:10.4189862Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-525.105.17.run 2025-12-04T09:40:14.7857630Z + set +e 2025-12-04T09:40:14.7858131Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm 2025-12-04T09:40:16.2585947Z Verifying archive integrity... OK 2025-12-04T09:40:43.6226299Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.105.17................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 2025-12-04T09:40:44.1638808Z 2025-12-04T09:40:44.1639724Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. 2025-12-04T09:40:44.1640424Z 2025-12-04T09:41:10.0974212Z 2025-12-04T09:41:10.0976287Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver. 2025-12-04T09:41:10.0977954Z 2025-12-04T09:41:10.0994053Z 2025-12-04T09:41:10.0995488Z WARNING: This NVIDIA driver package includes Vulkan components, but no Vulkan ICD loader was detected on this system. The NVIDIA Vulkan ICD will not function without the loader. Most distributions package the Vulkan loader; try installing the "vulkan-loader", "vulkan-icd-loader", or "libvulkan1" package. 2025-12-04T09:41:10.0996965Z 2025-12-04T09:41:21.5384228Z + NVIDIA_INSTALLATION_STATUS=0 2025-12-04T09:41:21.5384657Z + RESET_GPU=0 2025-12-04T09:41:21.5384937Z + '[' 0 -ne 0 ']' 2025-12-04T09:41:21.5386324Z ++ command -v nvidia-smi 2025-12-04T09:41:21.5389482Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:41:21.5393319Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:41:24.1232702Z + INSTALLED_DRIVER_VERSION=525.105.17 2025-12-04T09:41:24.1233131Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:41:24.1233413Z + '[' 0 -ne 0 ']' 2025-12-04T09:41:24.1233675Z + '[' 0 -eq 1 ']' 2025-12-04T09:41:24.1234089Z + sudo rm -fv /tmp/nvidia_driver 2025-12-04T09:41:24.2705427Z removed '/tmp/nvidia_driver' 2025-12-04T09:41:24.2724880Z + set -e 2025-12-04T09:41:24.2727304Z + post_install_nvidia_driver_common 2025-12-04T09:41:24.2730946Z + sudo modprobe nvidia 2025-12-04T09:41:24.4685087Z + echo 'After installing NVIDIA driver' 2025-12-04T09:41:24.4685777Z + lspci 2025-12-04T09:41:24.4686195Z After installing NVIDIA driver 2025-12-04T09:41:24.4820713Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:41:24.4821362Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:41:24.4822072Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:41:24.4822730Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:41:24.4823341Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:41:24.4824020Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:41:24.4824648Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:41:24.4825652Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:41:24.4826183Z + lsmod 2025-12-04T09:41:24.4850576Z Module Size Used by 2025-12-04T09:41:24.4850929Z nvidia 56537088 0 2025-12-04T09:41:24.4851249Z drm 602112 1 nvidia 2025-12-04T09:41:24.4851634Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:41:24.4852003Z backlight 24576 1 drm 2025-12-04T09:41:24.4852362Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:41:24.4852724Z xt_conntrack 16384 1 2025-12-04T09:41:24.4853035Z nft_chain_nat 16384 3 2025-12-04T09:41:24.4853351Z xt_MASQUERADE 20480 1 2025-12-04T09:41:24.4853718Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:41:24.4854127Z nf_conntrack_netlink 57344 0 2025-12-04T09:41:24.4854630Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:41:24.4855191Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:41:24.4855743Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:41:24.4856106Z xfrm_user 57344 1 2025-12-04T09:41:24.4856440Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:41:24.4856806Z xt_addrtype 16384 2 2025-12-04T09:41:24.4857129Z nft_compat 20480 4 2025-12-04T09:41:24.4857498Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:41:24.4858027Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:41:24.4858500Z br_netfilter 36864 0 2025-12-04T09:41:24.4858828Z bridge 323584 1 br_netfilter 2025-12-04T09:41:24.4859197Z stp 16384 1 bridge 2025-12-04T09:41:24.4859550Z llc 16384 2 bridge,stp 2025-12-04T09:41:24.4859888Z overlay 167936 0 2025-12-04T09:41:24.4860196Z tls 139264 0 2025-12-04T09:41:24.4860509Z nls_ascii 16384 1 2025-12-04T09:41:24.4860804Z nls_cp437 20480 1 2025-12-04T09:41:24.4861118Z vfat 24576 1 2025-12-04T09:41:24.4861425Z fat 86016 1 vfat 2025-12-04T09:41:24.4861754Z sunrpc 700416 1 2025-12-04T09:41:24.4862045Z i8042 45056 0 2025-12-04T09:41:24.4862347Z ena 184320 0 2025-12-04T09:41:24.4862661Z skx_edac_common 28672 0 2025-12-04T09:41:24.4862967Z serio 28672 3 i8042 2025-12-04T09:41:24.4863312Z ghash_clmulni_intel 16384 0 2025-12-04T09:41:24.4863630Z button 24576 0 2025-12-04T09:41:24.4863928Z sch_fq_codel 20480 17 2025-12-04T09:41:24.4864243Z dm_mod 188416 0 2025-12-04T09:41:24.4864544Z fuse 184320 1 2025-12-04T09:41:24.4864835Z configfs 57344 1 2025-12-04T09:41:24.4865143Z loop 36864 0 2025-12-04T09:41:24.4865451Z dmi_sysfs 20480 0 2025-12-04T09:41:24.4865752Z crc32_pclmul 16384 0 2025-12-04T09:41:24.4866066Z crc32c_intel 24576 0 2025-12-04T09:41:24.4866382Z efivarfs 24576 1 2025-12-04T09:41:24.4866690Z + modinfo nvidia 2025-12-04T09:41:24.4868117Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:41:24.4868731Z firmware: nvidia/525.105.17/gsp_tu10x.bin 2025-12-04T09:41:24.4869160Z firmware: nvidia/525.105.17/gsp_ad10x.bin 2025-12-04T09:41:24.4869549Z alias: char-major-195-* 2025-12-04T09:41:24.4869881Z version: 525.105.17 2025-12-04T09:41:24.4870190Z supported: external 2025-12-04T09:41:24.4870475Z license: NVIDIA 2025-12-04T09:41:24.4870776Z srcversion: 98F82D76E0EF3952EEE57A7 2025-12-04T09:41:24.4871172Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:41:24.4871598Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:41:24.4872005Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:41:24.4872498Z depends: i2c-core,drm 2025-12-04T09:41:24.4872831Z retpoline: Y 2025-12-04T09:41:24.4873091Z name: nvidia 2025-12-04T09:41:24.4873546Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:41:24.4874146Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:41:24.4874708Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:41:24.4875222Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:41:24.4875610Z parm: NVreg_RmLogonRC:int 2025-12-04T09:41:24.4875983Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:41:24.4876361Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:41:24.4876741Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:41:24.4877120Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:41:24.4877554Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:41:24.4878034Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:41:24.4878450Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:41:24.4878897Z parm: NVreg_EnableMSI:int 2025-12-04T09:41:24.4879263Z parm: NVreg_TCEBypassMode:int 2025-12-04T09:41:24.4879661Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:41:24.4880114Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:41:24.4880590Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:41:24.4881065Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:41:24.4881577Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:41:24.4882070Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:41:24.4882659Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:41:24.4883169Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:41:24.4883575Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:41:24.4884100Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:41:24.4884559Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:41:24.4884986Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:41:24.4885367Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:41:24.4885775Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:41:24.4886178Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:41:24.4886555Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:41:24.4886985Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:41:24.4887429Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:41:24.4887830Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:41:24.4888244Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:41:24.4888668Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:41:24.4889084Z parm: NVreg_RmMsg:charp 2025-12-04T09:41:24.4889433Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:41:24.4889840Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:41:24.4890243Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:41:24.4890625Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:41:24.4891012Z parm: rm_firmware_active:charp 2025-12-04T09:41:24.4891359Z + set +e 2025-12-04T09:41:24.4891580Z + nvidia-smi 2025-12-04T09:41:26.4660911Z Thu Dec 4 09:41:26 2025 2025-12-04T09:41:26.4661457Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:26.4662089Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:41:26.4662685Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:26.4663274Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:41:26.4663930Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:41:26.4664464Z | | | MIG M. | 2025-12-04T09:41:26.4665178Z |===============================+======================+======================| 2025-12-04T09:41:26.4739958Z | 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:41:26.4740512Z | N/A 25C P0 27W / 70W | 2MiB / 15360MiB | 4% Default | 2025-12-04T09:41:26.4741071Z | | | N/A | 2025-12-04T09:41:26.4741525Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:26.4742000Z 2025-12-04T09:41:26.4742460Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:26.4742967Z | Processes: | 2025-12-04T09:41:26.4743481Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:41:26.4743985Z | ID ID Usage | 2025-12-04T09:41:26.4744608Z |=============================================================================| 2025-12-04T09:41:26.4745133Z | No running processes found | 2025-12-04T09:41:26.4745687Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:26.9276738Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:41:28.8952110Z Tesla T4 2025-12-04T09:41:29.2969546Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:41:29.2969906Z + '[' 0 -eq 0 ']' 2025-12-04T09:41:29.2970193Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:41:29.2970554Z + set -e 2025-12-04T09:41:29.2970809Z INFO: Ignoring allowed status 0 2025-12-04T09:41:29.2977034Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:41:29.2981205Z + sudo yum install -y yum-utils 2025-12-04T09:41:29.8140999Z Last metadata expiration check: 0:23:40 ago on Thu Dec 4 09:17:49 2025. 2025-12-04T09:41:29.8470924Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:41:29.9081594Z Dependencies resolved. 2025-12-04T09:41:29.9418563Z Nothing to do. 2025-12-04T09:41:29.9419845Z Complete! 2025-12-04T09:41:30.0551755Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:41:30.0552527Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:30.0553652Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:30.4319172Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:41:30.4911181Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:41:31.1450748Z nvidia-container-toolkit 19 kB/s | 833 B 00:00 2025-12-04T09:41:31.2485703Z Dependencies resolved. 2025-12-04T09:41:31.2822789Z ================================================================================ 2025-12-04T09:41:31.2823329Z Package Arch Version Repository Size 2025-12-04T09:41:31.2823809Z ================================================================================ 2025-12-04T09:41:31.2824179Z Downgrading: 2025-12-04T09:41:31.2824641Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:41:31.2825361Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:41:31.2826055Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:41:31.2826804Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:41:31.2827266Z 2025-12-04T09:41:31.2827371Z Transaction Summary 2025-12-04T09:41:31.2827673Z ================================================================================ 2025-12-04T09:41:31.2828314Z Downgrade 4 Packages 2025-12-04T09:41:31.2828509Z 2025-12-04T09:41:31.2828636Z Total download size: 8.0 M 2025-12-04T09:41:31.2830158Z Downloading Packages: 2025-12-04T09:41:31.3236570Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 1.0 MB/s | 40 kB 00:00 2025-12-04T09:41:31.3814796Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm 10 MB/s | 1.0 MB 00:00 2025-12-04T09:41:31.4351010Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.2 MB/s | 1.2 MB 00:00 2025-12-04T09:41:31.5698092Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 23 MB/s | 5.8 MB 00:00 2025-12-04T09:41:31.5710379Z -------------------------------------------------------------------------------- 2025-12-04T09:41:31.5715508Z Total 28 MB/s | 8.0 MB 00:00 2025-12-04T09:41:31.5719210Z Running transaction check 2025-12-04T09:41:31.5879711Z Transaction check succeeded. 2025-12-04T09:41:31.5880076Z Running transaction test 2025-12-04T09:41:31.6438864Z Transaction test succeeded. 2025-12-04T09:41:31.6443402Z Running transaction 2025-12-04T09:41:32.6546223Z Preparing : 1/1 2025-12-04T09:41:32.8027049Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:41:32.8313633Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:41:32.9098275Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:41:33.0689705Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:41:33.0995427Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:41:33.1688647Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:41:33.1758470Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:33.1759614Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:33.2077231Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:41:33.2139712Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:33.2140855Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:33.2520573Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:41:33.2591447Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:33.2592640Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:33.2949463Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:41:33.3015457Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:33.3016877Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:33.3424797Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:33.3997043Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:41:34.8919455Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:41:34.8920241Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:41:34.8920929Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:41:34.8921601Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:41:34.8922298Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:41:34.8922969Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:41:34.8923633Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:41:34.8925083Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:41:35.0554752Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:41:35.0555479Z WARNING: 2025-12-04T09:41:35.0555789Z A newer release of "Amazon Linux" is available. 2025-12-04T09:41:35.0556078Z 2025-12-04T09:41:35.0556201Z Available Versions: 2025-12-04T09:41:35.0556383Z 2025-12-04T09:41:35.0556489Z Version 2023.9.20250929: 2025-12-04T09:41:35.0556881Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:41:35.0557218Z 2025-12-04T09:41:35.0557383Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:41:35.0557650Z 2025-12-04T09:41:35.0557767Z Release notes: 2025-12-04T09:41:35.0558270Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:41:35.0558772Z 2025-12-04T09:41:35.0558878Z Version 2023.9.20251014: 2025-12-04T09:41:35.0559504Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:41:35.0559836Z 2025-12-04T09:41:35.0559977Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:41:35.0560257Z 2025-12-04T09:41:35.0560357Z Release notes: 2025-12-04T09:41:35.0560861Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:41:35.0561330Z 2025-12-04T09:41:35.0561451Z Version 2023.9.20251020: 2025-12-04T09:41:35.0561826Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:41:35.0562225Z 2025-12-04T09:41:35.0562366Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:41:35.0562630Z 2025-12-04T09:41:35.0562748Z Release notes: 2025-12-04T09:41:35.0563232Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:41:35.0563723Z 2025-12-04T09:41:35.0563835Z Version 2023.9.20251027: 2025-12-04T09:41:35.0564223Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:41:35.0564540Z 2025-12-04T09:41:35.0564694Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:41:35.0564956Z 2025-12-04T09:41:35.0565059Z Release notes: 2025-12-04T09:41:35.0565550Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:41:35.0566022Z 2025-12-04T09:41:35.0566140Z Version 2023.9.20251105: 2025-12-04T09:41:35.0566518Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:41:35.0566835Z 2025-12-04T09:41:35.0566972Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:41:35.0567243Z 2025-12-04T09:41:35.0567339Z Release notes: 2025-12-04T09:41:35.0567826Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:41:35.0568291Z 2025-12-04T09:41:35.0568394Z Version 2023.9.20251110: 2025-12-04T09:41:35.0568772Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:41:35.0569104Z 2025-12-04T09:41:35.0569239Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:41:35.0569502Z 2025-12-04T09:41:35.0569612Z Release notes: 2025-12-04T09:41:35.0570090Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:41:35.0570574Z 2025-12-04T09:41:35.0570676Z Version 2023.9.20251117: 2025-12-04T09:41:35.0571057Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:41:35.0571370Z 2025-12-04T09:41:35.0571503Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:41:35.0571778Z 2025-12-04T09:41:35.0571877Z Release notes: 2025-12-04T09:41:35.0572368Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:41:35.0572831Z 2025-12-04T09:41:35.0572979Z ================================================================================ 2025-12-04T09:41:35.1239767Z 2025-12-04T09:41:35.1240198Z 2025-12-04T09:41:35.1240296Z Downgraded: 2025-12-04T09:41:35.1240755Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:41:35.1241479Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:41:35.1242214Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:41:35.1242951Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:41:35.1243385Z 2025-12-04T09:41:35.1243494Z Complete! 2025-12-04T09:41:35.1814760Z + sudo systemctl restart docker 2025-12-04T09:41:41.1111468Z Thu Dec 4 09:41:41 2025 2025-12-04T09:41:41.1111969Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:41.1112576Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:41:41.1113170Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:41.1113812Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:41:41.1114753Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:41:41.1115286Z | | | MIG M. | 2025-12-04T09:41:41.1115693Z |===============================+======================+======================| 2025-12-04T09:41:41.1210757Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:41:41.1211359Z | N/A 25C P0 27W / 70W | 2MiB / 15360MiB | 7% Default | 2025-12-04T09:41:41.1211827Z | | | N/A | 2025-12-04T09:41:41.1212295Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:41.1212777Z 2025-12-04T09:41:41.1213353Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:41.1213959Z | Processes: | 2025-12-04T09:41:41.1214511Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:41:41.1214996Z | ID ID Usage | 2025-12-04T09:41:41.1215419Z |=============================================================================| 2025-12-04T09:41:41.1215938Z | No running processes found | 2025-12-04T09:41:41.1216507Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:41.1946955Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:41:41.4215879Z 3.13: Pulling from docker/library/python 2025-12-04T09:41:41.5196518Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:41:41.5197003Z eae668646f44: Pulling fs layer 2025-12-04T09:41:41.5197377Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:41:41.5197764Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:41:41.5198097Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:41:41.5198435Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:41:41.5198748Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:41:41.5199161Z 7c40a3faff76: Waiting 2025-12-04T09:41:41.5199471Z 967a3b1c8fef: Waiting 2025-12-04T09:41:41.5199735Z a64e1a44f22a: Waiting 2025-12-04T09:41:41.5200003Z 52655f8a5bcc: Waiting 2025-12-04T09:41:41.7085887Z eae668646f44: Verifying Checksum 2025-12-04T09:41:41.7086523Z eae668646f44: Download complete 2025-12-04T09:41:41.8051150Z 53c88f1dfeb7: Verifying Checksum 2025-12-04T09:41:41.8051593Z 53c88f1dfeb7: Download complete 2025-12-04T09:41:41.8872854Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:41:41.8873250Z 967a3b1c8fef: Download complete 2025-12-04T09:41:41.9177237Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:41:41.9177668Z ff2e6e687b6c: Download complete 2025-12-04T09:41:41.9754487Z 52655f8a5bcc: Verifying Checksum 2025-12-04T09:41:41.9755300Z 52655f8a5bcc: Download complete 2025-12-04T09:41:42.0865636Z a64e1a44f22a: Verifying Checksum 2025-12-04T09:41:42.0866107Z a64e1a44f22a: Download complete 2025-12-04T09:41:42.8753600Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:41:42.8754030Z 7c40a3faff76: Download complete 2025-12-04T09:41:43.2858492Z 53c88f1dfeb7: Pull complete 2025-12-04T09:41:43.8822860Z eae668646f44: Pull complete 2025-12-04T09:41:45.8967448Z ff2e6e687b6c: Pull complete 2025-12-04T09:41:51.6887848Z 7c40a3faff76: Pull complete 2025-12-04T09:41:51.9209803Z 967a3b1c8fef: Pull complete 2025-12-04T09:41:52.5760960Z a64e1a44f22a: Pull complete 2025-12-04T09:41:52.5975061Z 52655f8a5bcc: Pull complete 2025-12-04T09:41:52.6105395Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:41:52.6145981Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:41:59.8173066Z Thu Dec 4 09:41:59 2025 2025-12-04T09:41:59.8173548Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:59.8174449Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T09:41:59.8175055Z |-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:59.8175659Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:41:59.8176303Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:41:59.8176839Z | | | MIG M. | 2025-12-04T09:41:59.8177247Z |===============================+======================+======================| 2025-12-04T09:41:59.8327708Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:41:59.8328247Z | N/A 25C P8 11W / 70W | 2MiB / 15360MiB | 0% Default | 2025-12-04T09:41:59.8328735Z | | | N/A | 2025-12-04T09:41:59.8329265Z +-------------------------------+----------------------+----------------------+ 2025-12-04T09:41:59.8329766Z 2025-12-04T09:41:59.8330225Z +-----------------------------------------------------------------------------+ 2025-12-04T09:41:59.8330871Z | Processes: | 2025-12-04T09:41:59.8331435Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:41:59.8331910Z | ID ID Usage | 2025-12-04T09:41:59.8332346Z |=============================================================================| 2025-12-04T09:41:59.8332870Z | No running processes found | 2025-12-04T09:41:59.8333431Z +-----------------------------------------------------------------------------+ 2025-12-04T09:42:01.3324228Z Command completed after 1 attempt(s). 2025-12-04T09:42:01.3423773Z Prepare all required actions 2025-12-04T09:42:01.3458419Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:42:01.3458814Z with: 2025-12-04T09:42:01.3459515Z github-token: *** 2025-12-04T09:42:01.3459791Z env: 2025-12-04T09:42:01.3460021Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:01.3460342Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:01.3460717Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:01.3461139Z ##[endgroup] 2025-12-04T09:42:01.3477144Z ##[group]Run set -eux 2025-12-04T09:42:01.3477450Z set -eux 2025-12-04T09:42:01.3477977Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:42:01.3489938Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:01.3490372Z env: 2025-12-04T09:42:01.3490621Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:01.3490931Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:01.3491506Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:01.3492118Z GITHUB_TOKEN: *** 2025-12-04T09:42:01.3492375Z ##[endgroup] 2025-12-04T09:42:01.3527785Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-03bbda7791efb68ed 2025-12-04T09:42:03.3928672Z Setting output job-id=57119749427 2025-12-04T09:42:03.3930479Z Setting output job-name=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:03.4062013Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:42:03.4062910Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:42:03.4064065Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:42:03.4065084Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:03.4071854Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:03.4072291Z env: 2025-12-04T09:42:03.4072549Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:03.4072869Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:03.4073228Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:03.4073652Z JOB_ID: 57119749427 2025-12-04T09:42:03.4074419Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:03.4075223Z WORKFLOW_NAME: periodic 2025-12-04T09:42:03.4075543Z WORKFLOW_RUN_ID: 19922826259 2025-12-04T09:42:03.4075870Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:42:03.4076173Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:42:03.4076513Z ##[endgroup] 2025-12-04T09:42:03.7321083Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:42:04.1464489Z Collecting psutil==5.9.8 2025-12-04T09:42:04.1659122Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:42:04.2507907Z Collecting dataclasses_json==0.6.7 2025-12-04T09:42:04.2552969Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:42:04.2860830Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:42:04.2902558Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:42:04.4228915Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:42:04.4271966Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:42:04.4528198Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:42:04.4568199Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:42:04.5184047Z Collecting packaging>=17.0 2025-12-04T09:42:04.5225174Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:42:04.5498068Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:42:04.5537748Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:42:04.6071953Z Collecting typing-extensions>=3.7.4 2025-12-04T09:42:04.6111627Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:42:04.7156979Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:42:05.0330872Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:42:05.2379579Z Prepare all required actions 2025-12-04T09:42:05.2380046Z Getting action download info 2025-12-04T09:42:05.4096234Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:42:05.6495293Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:42:06.0164772Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:42:06.0165358Z with: 2025-12-04T09:42:06.0165693Z name: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:06.0166088Z s3-bucket: gha-artifacts 2025-12-04T09:42:06.0166379Z env: 2025-12-04T09:42:06.0166626Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:06.0166941Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:06.0167314Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:06.0167718Z ##[endgroup] 2025-12-04T09:42:06.0200133Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:42:06.0200536Z with: 2025-12-04T09:42:06.0201139Z name: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:06.0201520Z s3-bucket: gha-artifacts 2025-12-04T09:42:06.0201829Z region: us-east-1 2025-12-04T09:42:06.0202213Z env: 2025-12-04T09:42:06.0202448Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:06.0202756Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:06.0203131Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:06.0203536Z ##[endgroup] 2025-12-04T09:42:06.5658141Z (node:68884) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:42:06.5658740Z 2025-12-04T09:42:06.5658979Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:42:06.5659624Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:42:06.5660291Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:42:06.8644352Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.4-py3.10-gcc11/ 2025-12-04T09:42:06.8645240Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:42:14.9231127Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:42:14.9237791Z Artifact download has finished successfully 2025-12-04T09:42:14.9440238Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:42:14.9440625Z unzip -o artifacts.zip 2025-12-04T09:42:14.9448066Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:14.9448513Z env: 2025-12-04T09:42:14.9448746Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:14.9449057Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:14.9449423Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:14.9449828Z ##[endgroup] 2025-12-04T09:42:14.9521289Z Archive: artifacts.zip 2025-12-04T09:42:14.9522938Z creating: dist/ 2025-12-04T09:42:16.9691753Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:42:16.9835582Z inflating: dist/.ninja_log 2025-12-04T09:42:16.9836414Z creating: build/custom_test_artifacts/ 2025-12-04T09:42:16.9836940Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:42:16.9837510Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:42:16.9838217Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:16.9846110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:16.9846949Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:16.9847728Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:16.9848584Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:16.9849417Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:16.9850818Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:16.9852093Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:16.9853008Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:16.9853893Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:16.9854926Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:16.9856415Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:16.9857918Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:16.9859001Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:16.9860662Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:16.9862641Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:16.9863617Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:16.9864486Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:16.9926300Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:16.9990772Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:16.9992078Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:17.0059707Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:17.0060954Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:17.0062232Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:17.0063539Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:17.0064811Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:17.0066041Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:17.0067278Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:17.0068505Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:17.0069716Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:17.0070857Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:17.0071948Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:17.0073033Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:17.0074117Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:17.0075168Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:17.0076462Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:17.0154305Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:17.0155671Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:17.0237359Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:17.0238604Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:17.0239345Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:17.0240105Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:17.0240921Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:42:17.0241829Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:42:17.0242888Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:42:17.0243862Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:42:17.0244783Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:42:17.0245719Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:42:17.0246644Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:42:17.0247588Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:42:17.0248527Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:42:17.0249453Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:42:17.0267186Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:42:17.0485771Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:42:17.0486655Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:42:17.0487635Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:42:17.0488708Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:42:17.0489725Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:42:17.0490684Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:42:17.0491683Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:42:17.0492684Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:42:17.0493662Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:42:17.0494659Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:42:17.0495641Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:42:17.0514624Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:42:17.0604307Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:42:17.0605590Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:17.0606540Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:17.0607391Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:42:17.0608163Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:42:17.0608934Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:17.0609852Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:42:17.0612237Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:42:17.0613077Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:42:17.0613769Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:42:17.0804944Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:42:17.0867255Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:42:17.0867859Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:42:17.0868426Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:42:17.0869111Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:17.0876621Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:17.0877411Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:17.0878188Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:17.0879027Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:17.0879839Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:17.0880900Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:17.0882463Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:17.0883376Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:17.0884239Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:17.0885065Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:17.0887011Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:17.0888479Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:17.0889529Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:17.0891148Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:17.0893258Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:17.0894213Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:17.0895063Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:17.0956822Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:17.1021154Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:17.1022426Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:17.1090206Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:17.1091445Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:17.1092704Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:17.1093977Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:17.1095331Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:17.1096554Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:17.1097791Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:17.1098994Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:17.1100192Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:17.1101488Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:17.1102584Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:17.1103648Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:17.1104726Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:17.1105776Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:17.1106839Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:17.1184913Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:17.1185856Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:17.1267767Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:17.1268712Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:17.1269425Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:17.1270187Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:17.1270998Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:42:17.1271925Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:42:17.1272961Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:42:17.1273957Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:42:17.1274877Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:42:17.1275840Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:42:17.1276807Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:42:17.1277775Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:42:17.1278749Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:42:17.1279909Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:42:17.1297597Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:42:17.1367619Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:42:17.1368650Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:17.1369769Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:17.1370589Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:42:17.1371361Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:42:17.1372115Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:17.1372879Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:42:17.1375416Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:42:17.1376253Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:42:17.1376942Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:42:17.1420659Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:42:17.1421296Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:42:17.1421915Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:42:17.1422667Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:42:17.1430070Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:42:17.1430930Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:42:17.1431771Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:42:17.1432687Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:42:17.1433580Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:42:17.1434602Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:42:17.1435871Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:42:17.1436854Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:42:17.1437796Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:42:17.1438712Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:42:17.1440302Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:42:17.1441810Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:42:17.1442968Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:42:17.1444642Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:42:17.1446575Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:42:17.1447617Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:42:17.1448539Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:42:17.1510953Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:42:17.1574984Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:42:17.1576323Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:42:17.1643950Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:42:17.1645429Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:42:17.1646775Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:42:17.1648136Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:42:17.1649478Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:42:17.1650772Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:42:17.1652087Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:42:17.1653392Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:42:17.1654650Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:42:17.1655849Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:42:17.1657023Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:42:17.1658165Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:42:17.1659309Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:42:17.1660425Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:42:17.1661573Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:42:17.1738683Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:42:17.1739686Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:42:17.1821047Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:42:17.1822056Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:42:17.1822851Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:42:17.1823666Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:42:17.1824539Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:42:17.1825533Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:42:17.1826676Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:42:17.1827752Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:42:17.1828761Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:42:17.1830020Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:42:17.1831082Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:42:17.1832125Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:42:17.1833179Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:42:17.1834304Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:42:17.1835420Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:42:17.1965428Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:42:17.1966479Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:42:17.1967534Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:42:17.1968721Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:42:17.1969871Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:42:17.1970936Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:42:17.1972041Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:42:17.1973168Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:42:17.1974287Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:42:17.1975382Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:42:17.1976473Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:42:17.1994479Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:42:17.2055292Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:42:17.2056439Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:42:17.2057456Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:42:17.2058368Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:42:17.2059209Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:42:17.2060033Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:42:17.2060850Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:42:17.2063210Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:42:17.2064737Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:42:17.2065469Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:42:17.2177681Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:42:17.2221556Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:42:17.2222136Z creating: build/lib/ 2025-12-04T09:42:17.2312648Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:42:17.2800316Z inflating: build/lib/libprotobuf.a 2025-12-04T09:42:17.3346555Z inflating: build/lib/libprotoc.a 2025-12-04T09:42:17.3357607Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:42:17.3366844Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:42:17.3375463Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:42:17.3376435Z inflating: build/lib/libclog.a 2025-12-04T09:42:17.3397451Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:42:17.3400150Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:42:17.3420062Z inflating: build/lib/libnnpack.a 2025-12-04T09:42:17.3626055Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:42:17.4596709Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:42:17.4673395Z inflating: build/lib/libgtest.a 2025-12-04T09:42:17.4692704Z inflating: build/lib/libgmock.a 2025-12-04T09:42:17.4693534Z inflating: build/lib/libgtest_main.a 2025-12-04T09:42:17.4694387Z inflating: build/lib/libgmock_main.a 2025-12-04T09:42:17.4794305Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:42:17.4877682Z inflating: build/lib/libbenchmark.a 2025-12-04T09:42:17.4878535Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:42:17.4887506Z inflating: build/lib/libittnotify.a 2025-12-04T09:42:17.4960545Z inflating: build/lib/libasmjit.a 2025-12-04T09:42:17.4961448Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:42:17.6242344Z inflating: build/lib/libfbgemm.a 2025-12-04T09:42:17.6276421Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:42:17.6872655Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:42:17.7140793Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:42:17.7289516Z inflating: build/lib/libgloo.a 2025-12-04T09:42:17.7341853Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:42:17.7812485Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:42:17.8596908Z inflating: build/lib/libonnx.a 2025-12-04T09:42:18.9685725Z inflating: build/lib/libdnnl.a 2025-12-04T09:42:18.9707412Z inflating: build/lib/libfmt.a 2025-12-04T09:42:19.0236826Z inflating: build/lib/libkineto.a 2025-12-04T09:42:19.0366057Z inflating: build/lib/libc10.so 2025-12-04T09:42:19.0421088Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:42:19.0422726Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:42:19.0424680Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:42:22.4588513Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:42:24.2551071Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:42:24.2555707Z inflating: build/lib/libshm.so 2025-12-04T09:42:24.2557151Z inflating: build/lib/libtorch.so 2025-12-04T09:42:24.2610730Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:42:24.2613525Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:42:24.2691823Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:42:24.2713499Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:42:24.2740003Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:42:24.2768936Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:42:24.5401263Z inflating: build/lib/libtorch_python.so 2025-12-04T09:42:24.5441255Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:42:24.5441649Z creating: build/bin/ 2025-12-04T09:42:24.5951930Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:42:24.6460796Z inflating: build/bin/protoc 2025-12-04T09:42:24.6527311Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:42:24.6589055Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:42:24.6652958Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:42:24.6716986Z inflating: build/bin/c10_Device_test 2025-12-04T09:42:24.6790316Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:42:24.6850877Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:42:24.6917937Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:42:24.6987261Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:42:24.7056097Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:42:24.7123714Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:42:24.7192494Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:42:24.7254365Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:42:24.7315185Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:42:24.7400376Z inflating: build/bin/c10_cow_test 2025-12-04T09:42:24.7465631Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:42:24.7527632Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:42:24.7597431Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:42:24.7660102Z inflating: build/bin/c10_Half_test 2025-12-04T09:42:24.7725515Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:42:24.7791025Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:42:24.7859659Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:42:24.7921539Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:42:24.7982810Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:42:24.8051320Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:42:24.8115255Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:42:24.8178935Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:42:24.8247724Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:42:24.8317182Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:42:24.8379362Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:42:24.8440722Z inflating: build/bin/c10_error_test 2025-12-04T09:42:24.8509387Z inflating: build/bin/c10_complex_test 2025-12-04T09:42:24.8573972Z inflating: build/bin/c10_exception_test 2025-12-04T09:42:24.8636253Z inflating: build/bin/c10_flags_test 2025-12-04T09:42:24.8698347Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:42:24.8882872Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:42:24.8945839Z inflating: build/bin/c10_irange_test 2025-12-04T09:42:24.9011829Z inflating: build/bin/c10_lazy_test 2025-12-04T09:42:24.9081765Z inflating: build/bin/c10_logging_test 2025-12-04T09:42:24.9143603Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:42:24.9234051Z inflating: build/bin/c10_optional_test 2025-12-04T09:42:24.9309460Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:42:24.9375019Z inflating: build/bin/c10_registry_test 2025-12-04T09:42:24.9554302Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:42:24.9617976Z inflating: build/bin/c10_ssize_test 2025-12-04T09:42:24.9687369Z inflating: build/bin/c10_string_util_test 2025-12-04T09:42:24.9741672Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:42:24.9803831Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:42:24.9864238Z inflating: build/bin/c10_string_view_test 2025-12-04T09:42:24.9933451Z inflating: build/bin/c10_typeid_test 2025-12-04T09:42:24.9998426Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:42:25.0063966Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:42:25.0128235Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:42:25.0192944Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:42:25.0254095Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:42:25.0319652Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:42:25.0384669Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:42:25.0449554Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:42:25.1117739Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:42:25.1805216Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:42:25.2502947Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:42:25.2564136Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:42:25.2680397Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:42:25.2742448Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:42:25.2804237Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:42:25.2868953Z inflating: build/bin/BackoffTest 2025-12-04T09:42:25.2935118Z inflating: build/bin/FileStoreTest 2025-12-04T09:42:25.3004976Z inflating: build/bin/TCPStoreTest 2025-12-04T09:42:25.3071291Z inflating: build/bin/HashStoreTest 2025-12-04T09:42:25.3087032Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:42:25.3091309Z inflating: build/bin/torch_shm_manager 2025-12-04T09:42:25.3180023Z inflating: build/bin/Dict_test 2025-12-04T09:42:25.3244693Z inflating: build/bin/Dimname_test 2025-12-04T09:42:25.3323642Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:42:25.3393393Z inflating: build/bin/NamedTensor_test 2025-12-04T09:42:25.3465375Z inflating: build/bin/apply_utils_test 2025-12-04T09:42:25.3537678Z inflating: build/bin/atest 2025-12-04T09:42:25.3615805Z inflating: build/bin/basic 2025-12-04T09:42:25.3682528Z inflating: build/bin/broadcast_test 2025-12-04T09:42:25.3745527Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:42:25.3816543Z inflating: build/bin/cpu_generator_test 2025-12-04T09:42:25.3881419Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:42:25.3991387Z inflating: build/bin/cpu_rng_test 2025-12-04T09:42:25.4054358Z inflating: build/bin/dlconvertor_test 2025-12-04T09:42:25.4125021Z inflating: build/bin/extension_backend_test 2025-12-04T09:42:25.4193285Z inflating: build/bin/half_test 2025-12-04T09:42:25.4309931Z inflating: build/bin/ivalue_test 2025-12-04T09:42:25.4371034Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:42:25.4436629Z inflating: build/bin/math_kernel_test 2025-12-04T09:42:25.4501979Z inflating: build/bin/memory_format_test 2025-12-04T09:42:25.4567830Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:42:25.4633500Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:42:25.4702361Z inflating: build/bin/native_test 2025-12-04T09:42:25.4765074Z inflating: build/bin/operator_name_test 2025-12-04T09:42:25.4827796Z inflating: build/bin/operators_test 2025-12-04T09:42:25.4892706Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:42:25.4974920Z inflating: build/bin/pow_test 2025-12-04T09:42:25.5044939Z inflating: build/bin/quantized_test 2025-12-04T09:42:25.5106415Z inflating: build/bin/reduce_ops_test 2025-12-04T09:42:25.5169382Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:42:25.5238545Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:42:25.5309356Z inflating: build/bin/scalar_test 2025-12-04T09:42:25.5373292Z inflating: build/bin/StorageUtils_test 2025-12-04T09:42:25.5437630Z inflating: build/bin/stride_properties_test 2025-12-04T09:42:25.5533230Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:42:25.5600148Z inflating: build/bin/test_parallel 2025-12-04T09:42:25.5662750Z inflating: build/bin/thread_init_test 2025-12-04T09:42:25.5730735Z inflating: build/bin/type_ptr_test 2025-12-04T09:42:25.5803773Z inflating: build/bin/type_test 2025-12-04T09:42:25.5868475Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:42:25.5930295Z inflating: build/bin/verify_api_visibility 2025-12-04T09:42:25.6016171Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:42:25.6079475Z inflating: build/bin/weakref_test 2025-12-04T09:42:25.6143393Z inflating: build/bin/wrapdim_test 2025-12-04T09:42:25.6206530Z inflating: build/bin/xla_tensor_test 2025-12-04T09:42:25.6279464Z inflating: build/bin/IListRef_test 2025-12-04T09:42:25.6405325Z inflating: build/bin/List_test 2025-12-04T09:42:25.6485749Z inflating: build/bin/KernelFunction_test 2025-12-04T09:42:25.6628236Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:42:25.6742122Z inflating: build/bin/kernel_function_test 2025-12-04T09:42:25.6891342Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:42:25.7012276Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:42:25.7085546Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:42:25.7199628Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:42:25.7263066Z inflating: build/bin/CppSignature_test 2025-12-04T09:42:25.7330885Z inflating: build/bin/backend_fallback_test 2025-12-04T09:42:25.7391764Z inflating: build/bin/op_allowlist_test 2025-12-04T09:42:25.7748404Z inflating: build/bin/op_registration_test 2025-12-04T09:42:25.7830395Z inflating: build/bin/inline_container_test 2025-12-04T09:42:25.7897130Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:42:25.7962609Z inflating: build/bin/cuda_apply_test 2025-12-04T09:42:25.8035928Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:42:25.8105341Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:42:25.8189827Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:42:25.8263035Z inflating: build/bin/cuda_complex_test 2025-12-04T09:42:25.8334760Z inflating: build/bin/cuda_cub_test 2025-12-04T09:42:25.8400064Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:42:25.8461565Z inflating: build/bin/cuda_device_test 2025-12-04T09:42:25.8540660Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:42:25.8605095Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:42:25.8670975Z inflating: build/bin/cuda_event_test 2025-12-04T09:42:25.8732664Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:42:25.8802409Z inflating: build/bin/cuda_generator_test 2025-12-04T09:42:25.8863981Z inflating: build/bin/cuda_half_test 2025-12-04T09:42:25.8927200Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:42:25.8988498Z inflating: build/bin/cuda_optional_test 2025-12-04T09:42:25.9052803Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:42:25.9117714Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:42:25.9179392Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:42:25.9254139Z inflating: build/bin/cuda_stream_test 2025-12-04T09:42:25.9319355Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:42:25.9380736Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:42:25.9780393Z inflating: build/bin/test_lazy 2025-12-04T09:42:25.9862217Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:42:25.9931803Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:42:26.1186261Z inflating: build/bin/test_jit 2025-12-04T09:42:26.1264202Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:42:26.1339454Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:42:26.1343030Z inflating: build/bin/example_allreduce 2025-12-04T09:42:26.1411359Z inflating: build/bin/test_dist_autograd 2025-12-04T09:42:26.1494609Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:42:26.1497340Z inflating: build/bin/parallel_benchmark 2025-12-04T09:42:26.2837817Z inflating: build/bin/test_api 2025-12-04T09:42:26.2838241Z creating: .additional_ci_files/ 2025-12-04T09:42:26.2910037Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:42:26.3172259Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:42:26.3204202Z ##[group]Run rm artifacts.zip 2025-12-04T09:42:26.3204574Z rm artifacts.zip 2025-12-04T09:42:26.3211708Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:26.3212143Z env: 2025-12-04T09:42:26.3212407Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:26.3212723Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:26.3213248Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:26.3213676Z ##[endgroup] 2025-12-04T09:42:26.3834481Z ##[group]Run df -H 2025-12-04T09:42:26.3834786Z df -H 2025-12-04T09:42:26.3841399Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:26.3841836Z env: 2025-12-04T09:42:26.3842188Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:26.3842508Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:26.3842880Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:26.3843494Z ##[endgroup] 2025-12-04T09:42:26.3893866Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:42:26.3894313Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:42:26.3894720Z tmpfs 34G 0 34G 0% /dev/shm 2025-12-04T09:42:26.3895128Z tmpfs 14G 562k 14G 1% /run 2025-12-04T09:42:26.3895509Z /dev/nvme0n1p1 161G 51G 111G 32% / 2025-12-04T09:42:26.3896017Z tmpfs 34G 17k 34G 1% /tmp 2025-12-04T09:42:26.3896483Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:42:26.3896900Z tmpfs 6.7G 0 6.7G 0% /run/user/0 2025-12-04T09:42:26.3935163Z Prepare all required actions 2025-12-04T09:42:26.3936106Z Getting action download info 2025-12-04T09:42:26.5601979Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:42:26.5602474Z with: 2025-12-04T09:42:26.5602734Z env: 2025-12-04T09:42:26.5603006Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:26.5603346Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:26.5603704Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:26.5604130Z ##[endgroup] 2025-12-04T09:42:26.5635529Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:42:26.5635933Z with: 2025-12-04T09:42:26.5636161Z name: td_results 2025-12-04T09:42:26.5636443Z s3-bucket: gha-artifacts 2025-12-04T09:42:26.5636749Z region: us-east-1 2025-12-04T09:42:26.5636992Z env: 2025-12-04T09:42:26.5637236Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:26.5637543Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:26.5637911Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:26.5638394Z ##[endgroup] 2025-12-04T09:42:27.1198800Z (node:68908) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:42:27.1199411Z 2025-12-04T09:42:27.1199633Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:42:27.1200422Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:42:27.1201279Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:42:27.2303755Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/ 2025-12-04T09:42:27.2304514Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:42:27.3287529Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:42:27.3293382Z Artifact download has finished successfully 2025-12-04T09:42:27.3491374Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:42:27.3492062Z mkdir -p .additional_ci_files 2025-12-04T09:42:27.3492871Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:42:27.3503261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:27.3504017Z env: 2025-12-04T09:42:27.3504436Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:27.3504981Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:27.3505601Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:27.3506346Z ##[endgroup] 2025-12-04T09:42:27.3633666Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:42:27.3634113Z .github/scripts/parse_ref.py 2025-12-04T09:42:27.3640328Z shell: /usr/bin/bash -e {0} 2025-12-04T09:42:27.3640647Z env: 2025-12-04T09:42:27.3640896Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:27.3641196Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:27.3641567Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:27.3641988Z ##[endgroup] 2025-12-04T09:42:27.3969394Z Setting output branch=main 2025-12-04T09:42:27.4117140Z Prepare all required actions 2025-12-04T09:42:27.4117619Z Getting action download info 2025-12-04T09:42:27.5589136Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:42:27.5589551Z with: 2025-12-04T09:42:27.5590032Z github-token: *** 2025-12-04T09:42:27.5599408Z test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:42:27.5613390Z job-name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:27.5614791Z env: 2025-12-04T09:42:27.5615211Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:27.5615758Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:27.5616400Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:27.5617160Z ##[endgroup] 2025-12-04T09:42:27.5708743Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:42:27.5709370Z with: 2025-12-04T09:42:27.5709727Z shell: bash 2025-12-04T09:42:27.5710161Z timeout_minutes: 10 2025-12-04T09:42:27.5710622Z max_attempts: 5 2025-12-04T09:42:27.5711055Z retry_wait_seconds: 30 2025-12-04T09:42:27.5712734Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:42:27.5714608Z polling_interval_seconds: 1 2025-12-04T09:42:27.5715198Z warning_on_retry: true 2025-12-04T09:42:27.5715734Z continue_on_error: false 2025-12-04T09:42:27.5716236Z env: 2025-12-04T09:42:27.5716646Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:27.5717165Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:27.5717807Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:27.5719084Z GITHUB_TOKEN: *** 2025-12-04T09:42:27.5719556Z ##[endgroup] 2025-12-04T09:42:27.6804649Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:42:27.9572725Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:42:28.1276859Z Collecting requests==2.27.1 2025-12-04T09:42:28.1454118Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:42:28.3968097Z Collecting pyyaml==6.0.2 2025-12-04T09:42:28.4030096Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:42:28.9329743Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:42:28.9369884Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:42:29.0458062Z Collecting certifi>=2017.4.17 2025-12-04T09:42:29.0497431Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:42:29.0828107Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:42:29.0833288Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:42:29.1798696Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:42:29.4899587Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:42:29.6587604Z Command completed after 1 attempt(s). 2025-12-04T09:42:29.6664884Z ##[group]Run set -x 2025-12-04T09:42:29.6665179Z set -x 2025-12-04T09:42:29.6665442Z  2025-12-04T09:42:29.6665908Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:42:29.6666465Z # in runner workspace 2025-12-04T09:42:29.6666921Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:42:29.6673684Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:29.6674126Z env: 2025-12-04T09:42:29.6674380Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.6674687Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.6675066Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.6675484Z ##[endgroup] 2025-12-04T09:42:29.6704059Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:42:29.6911661Z Setting output branch=main 2025-12-04T09:42:29.6992776Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:42:29.6993278Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:42:29.6993700Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:42:29.6994073Z  2025-12-04T09:42:29.6994518Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:42:29.6995084Z # in runner workspace 2025-12-04T09:42:29.6995587Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:42:29.6996175Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:42:29.6996606Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:42:29.7004235Z  --test-matrix "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" \ 2025-12-04T09:42:29.7011691Z  --selected-test-configs "" \ 2025-12-04T09:42:29.7012106Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:42:29.7012588Z  --tag "${TAG}" \ 2025-12-04T09:42:29.7012935Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:42:29.7013319Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:42:29.7013671Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:42:29.7020627Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:29.7021075Z env: 2025-12-04T09:42:29.7021327Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.7021630Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.7022003Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.7022769Z GITHUB_TOKEN: *** 2025-12-04T09:42:29.7023500Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:29.7024272Z PR_NUMBER: 2025-12-04T09:42:29.7024524Z TAG: 2025-12-04T09:42:29.7024780Z EVENT_NAME: schedule 2025-12-04T09:42:29.7025092Z SCHEDULE: 29 8 * * * 2025-12-04T09:42:29.7025384Z HEAD_BRANCH: main 2025-12-04T09:42:29.7025651Z ##[endgroup] 2025-12-04T09:42:29.7052760Z Workflow: periodic 2025-12-04T09:42:29.7053509Z Job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:29.9051728Z Setting output keep-going=True 2025-12-04T09:42:29.9052158Z Setting output ci-verbose-test-logs=False 2025-12-04T09:42:29.9052560Z Setting output ci-test-showlocals=False 2025-12-04T09:42:29.9052960Z Setting output ci-no-test-timeout=False 2025-12-04T09:42:29.9053338Z Setting output ci-no-td=False 2025-12-04T09:42:29.9053714Z Setting output ci-td-distributed=False 2025-12-04T09:42:29.9054094Z Setting output is-unstable=True 2025-12-04T09:42:29.9054445Z Setting output reenabled-issues= 2025-12-04T09:42:29.9070595Z Setting output test-matrix={"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]} 2025-12-04T09:42:29.9086739Z Setting output is-test-matrix-empty=False 2025-12-04T09:42:29.9271967Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:42:29.9272446Z echo "Filtered matrix:" 2025-12-04T09:42:29.9288342Z echo "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" 2025-12-04T09:42:29.9304508Z  2025-12-04T09:42:29.9304754Z echo 2025-12-04T09:42:29.9305072Z echo "Is the current job unstable? True" 2025-12-04T09:42:29.9305445Z  2025-12-04T09:42:29.9305681Z echo 2025-12-04T09:42:29.9305987Z echo "Is keep-going label set? True" 2025-12-04T09:42:29.9306347Z  2025-12-04T09:42:29.9306581Z echo 2025-12-04T09:42:29.9306866Z echo "Reenabled issues? " 2025-12-04T09:42:29.9313684Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:29.9314136Z env: 2025-12-04T09:42:29.9314397Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.9314714Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.9315070Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.9315501Z ##[endgroup] 2025-12-04T09:42:29.9342961Z Filtered matrix: 2025-12-04T09:42:29.9362142Z {include: [{config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}]} 2025-12-04T09:42:29.9377724Z 2025-12-04T09:42:29.9377847Z Is the current job unstable? True 2025-12-04T09:42:29.9378088Z 2025-12-04T09:42:29.9378206Z Is keep-going label set? True 2025-12-04T09:42:29.9378428Z 2025-12-04T09:42:29.9378544Z Reenabled issues? 2025-12-04T09:42:29.9455048Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:29.9455685Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:42:29.9462232Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:29.9462663Z env: 2025-12-04T09:42:29.9462914Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.9463234Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.9463593Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.9464016Z JOB_TIMEOUT: 600 2025-12-04T09:42:29.9464287Z ##[endgroup] 2025-12-04T09:42:29.9545107Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:29.9545730Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:29.9546290Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:42:29.9552587Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:42:29.9553033Z env: 2025-12-04T09:42:29.9553289Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.9553603Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.9553954Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.9554365Z ##[endgroup] 2025-12-04T09:42:29.9690941Z ##[group]Run set -x 2025-12-04T09:42:29.9691351Z set -x 2025-12-04T09:42:29.9691608Z  2025-12-04T09:42:29.9691902Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:42:29.9692354Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:42:29.9692823Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:42:29.9693251Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:42:29.9693594Z else 2025-12-04T09:42:29.9694033Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:42:29.9694408Z fi 2025-12-04T09:42:29.9694636Z  2025-12-04T09:42:29.9694949Z # Leaving 1GB for the runner and other things 2025-12-04T09:42:29.9695646Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:42:29.9696695Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:42:29.9697547Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:42:29.9698184Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:42:29.9698679Z  2025-12-04T09:42:29.9698974Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:42:29.9699385Z  SHM_OPTS= 2025-12-04T09:42:29.9699677Z  JENKINS_USER= 2025-12-04T09:42:29.9700081Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:42:29.9700658Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:42:29.9701337Z  # when job is cancelled 2025-12-04T09:42:29.9701707Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:42:29.9702090Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:42:29.9702459Z else 2025-12-04T09:42:29.9702757Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:42:29.9703146Z  JENKINS_USER="--user jenkins" 2025-12-04T09:42:29.9703514Z  DOCKER_SHELL_CMD= 2025-12-04T09:42:29.9703976Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:42:29.9704312Z fi 2025-12-04T09:42:29.9704615Z  2025-12-04T09:42:29.9705014Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:42:29.9705790Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:42:29.9706801Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:42:29.9707465Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:42:29.9707861Z container_name=$(docker run \ 2025-12-04T09:42:29.9708218Z  ${GPU_FLAG:-} \ 2025-12-04T09:42:29.9708573Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:42:29.9708984Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:42:29.9709333Z  -e PR_NUMBER \ 2025-12-04T09:42:29.9709647Z  -e GITHUB_ACTIONS \ 2025-12-04T09:42:29.9709988Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:42:29.9710341Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:42:29.9710660Z  -e GITHUB_JOB \ 2025-12-04T09:42:29.9710971Z  -e GITHUB_RUN_ID \ 2025-12-04T09:42:29.9711298Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:42:29.9711636Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:42:29.9711976Z  -e JOB_ID \ 2025-12-04T09:42:29.9712271Z  -e JOB_NAME \ 2025-12-04T09:42:29.9712574Z  -e BASE_SHA \ 2025-12-04T09:42:29.9712861Z  -e BRANCH \ 2025-12-04T09:42:29.9713149Z  -e SHA1 \ 2025-12-04T09:42:29.9713440Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:42:29.9713774Z  -e IN_WHEEL_TEST \ 2025-12-04T09:42:29.9714099Z  -e SHARD_NUMBER \ 2025-12-04T09:42:29.9714420Z  -e TEST_CONFIG \ 2025-12-04T09:42:29.9714729Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:42:29.9715277Z  -e REENABLED_ISSUES \ 2025-12-04T09:42:29.9715644Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:42:29.9716002Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:42:29.9716354Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:42:29.9716692Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:42:29.9717024Z  -e NO_TD \ 2025-12-04T09:42:29.9717311Z  -e TD_DISTRIBUTED \ 2025-12-04T09:42:29.9717645Z  -e PR_LABELS \ 2025-12-04T09:42:29.9717996Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:42:29.9718494Z  -e SCCACHE_BUCKET \ 2025-12-04T09:42:29.9718830Z  -e SCCACHE_REGION \ 2025-12-04T09:42:29.9719156Z  -e XLA_CUDA \ 2025-12-04T09:42:29.9719482Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:42:29.9719910Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:42:29.9720345Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:42:29.9720782Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:42:29.9721174Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:42:29.9721561Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:42:29.9721967Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:42:29.9722425Z  -e DASHBOARD_TAG \ 2025-12-04T09:42:29.9722761Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:42:29.9723194Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:42:29.9723673Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:42:29.9724166Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:42:29.9724631Z  --security-opt seccomp=unconfined \ 2025-12-04T09:42:29.9725030Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:42:29.9725363Z  --ipc=host \ 2025-12-04T09:42:29.9725663Z  ${SHM_OPTS} \ 2025-12-04T09:42:29.9725960Z  --tty \ 2025-12-04T09:42:29.9726224Z  --detach \ 2025-12-04T09:42:29.9726535Z  --name="${container_name}" \ 2025-12-04T09:42:29.9726902Z  ${JENKINS_USER} \ 2025-12-04T09:42:29.9727299Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:42:29.9727768Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:42:29.9728144Z  "${USED_IMAGE}" \ 2025-12-04T09:42:29.9728467Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:42:29.9728768Z ) 2025-12-04T09:42:29.9729159Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:42:29.9729647Z  2025-12-04T09:42:29.9729944Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:42:29.9730628Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:42:29.9731243Z fi 2025-12-04T09:42:29.9731484Z  2025-12-04T09:42:29.9732053Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:42:29.9738725Z shell: /usr/bin/bash -e {0} 2025-12-04T09:42:29.9739044Z env: 2025-12-04T09:42:29.9739282Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:42:29.9739594Z HAS_NVIDIA_GPU: true 2025-12-04T09:42:29.9739963Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:29.9740460Z BUILD_ENVIRONMENT: linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:29.9740884Z PR_NUMBER: 2025-12-04T09:42:29.9741167Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:42:29.9741544Z GITHUB_WORKFLOW: periodic 2025-12-04T09:42:29.9741839Z GITHUB_JOB: test 2025-12-04T09:42:29.9742118Z GITHUB_RUN_ID: 19922826259 2025-12-04T09:42:29.9742433Z GITHUB_RUN_NUMBER: 19107 2025-12-04T09:42:29.9742724Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:42:29.9743017Z JOB_ID: 57119749427 2025-12-04T09:42:29.9743739Z JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:29.9744621Z BRANCH: main 2025-12-04T09:42:29.9744937Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:29.9745398Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:29.9745821Z TEST_CONFIG: legacy_nvidia_driver 2025-12-04T09:42:29.9746164Z SHARD_NUMBER: 4 2025-12-04T09:42:29.9746433Z NUM_TEST_SHARDS: 5 2025-12-04T09:42:29.9746710Z EXTRA_FLAGS: 2025-12-04T09:42:29.9746959Z OP_BENCHMARK_TESTS: 2025-12-04T09:42:29.9747245Z REENABLED_ISSUES: 2025-12-04T09:42:29.9747607Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:42:29.9747923Z VERBOSE_TEST_LOGS: False 2025-12-04T09:42:29.9748233Z TEST_SHOWLOCALS: False 2025-12-04T09:42:29.9748539Z NO_TEST_TIMEOUT: False 2025-12-04T09:42:29.9748813Z NO_TD: False 2025-12-04T09:42:29.9749077Z TD_DISTRIBUTED: False 2025-12-04T09:42:29.9749442Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:42:29.9749854Z SCCACHE_REGION: us-east-1 2025-12-04T09:42:29.9750159Z SHM_SIZE: 2g 2025-12-04T09:42:29.9751084Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:29.9752778Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:29.9753792Z XLA_CUDA: 2025-12-04T09:42:29.9754211Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:42:29.9754753Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:42:29.9755131Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:42:29.9755473Z DASHBOARD_TAG: 2025-12-04T09:42:29.9755980Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:42:29.9756464Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:42:29.9756951Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:42:29.9757556Z ARTIFACTS_FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T09:42:29.9758210Z ##[endgroup] 2025-12-04T09:42:29.9784842Z + [[ legacy_nvidia_driver == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:42:29.9785309Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *onnx* ]] 2025-12-04T09:42:29.9785725Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:42:29.9788569Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:42:29.9810598Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 ' 2025-12-04T09:42:29.9810988Z + TOTAL_MEMORY_WITH_SWAP=64 2025-12-04T09:42:29.9811393Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:42:29.9811847Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:42:29.9812159Z + JENKINS_USER='--user jenkins' 2025-12-04T09:42:29.9812469Z + DOCKER_SHELL_CMD= 2025-12-04T09:42:29.9813400Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:29.9819835Z +++ nproc --ignore=2 2025-12-04T09:42:30.0014949Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:42:37.6856201Z + container_name=428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T09:42:37.6857078Z + echo DOCKER_CONTAINER_ID=428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T09:42:37.6857906Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:42:37.6862585Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:42:37.6865006Z + docker exec -t 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:42:38.1856489Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:42:39.0644388Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:42:39.0649217Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:42:39.0667263Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:42:39.0669328Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:42:39.0670898Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:42:39.0672485Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:42:39.0684787Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:42:39.1111351Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:42:39.1134394Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:42:39.1203113Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:42:39.5471954Z Installing collected packages: torch 2025-12-04T09:42:51.9109250Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:42:51.9881701Z + export TERM=vt100 2025-12-04T09:42:51.9882054Z + TERM=vt100 2025-12-04T09:42:51.9883534Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:42:51.9892290Z + source .ci/pytorch/common.sh 2025-12-04T09:42:51.9895763Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:42:51.9903626Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:42:51.9905087Z +++ declare -f -t trap_add 2025-12-04T09:42:51.9911111Z ++ set -ex -o pipefail 2025-12-04T09:42:51.9911467Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:42:51.9911893Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:42:51.9915307Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:42:51.9923144Z + source .ci/pytorch/common-build.sh 2025-12-04T09:42:51.9924964Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 != *win-* ]] 2025-12-04T09:42:51.9931408Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:42:51.9940303Z +++ cd .ci/pytorch 2025-12-04T09:42:51.9940655Z +++ pwd -P 2025-12-04T09:42:51.9949822Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:42:51.9950638Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-pch* ]] 2025-12-04T09:42:51.9951409Z ++ which sccache 2025-12-04T09:42:51.9970705Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:42:51.9971257Z ++ sccache --stop-server 2025-12-04T09:42:51.9998384Z ++ true 2025-12-04T09:42:51.9998886Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:42:52.0009187Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:42:52.0009830Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:42:52.0010362Z ++ shift 2025-12-04T09:42:52.0010629Z ++ for trap_add_name in "$@" 2025-12-04T09:42:52.0016343Z ++++ trap -p EXIT 2025-12-04T09:42:52.0018512Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:42:52.0018906Z ++++ extract_trap_cmd 2025-12-04T09:42:52.0019193Z ++++ printf '%s\n' '' 2025-12-04T09:42:52.0019558Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:42:52.0021375Z ++ trap -- ' 2025-12-04T09:42:52.0021826Z sccache_epilogue' EXIT 2025-12-04T09:42:52.0022343Z ++ [[ -n 1 ]] 2025-12-04T09:42:52.0023078Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:42:52.0024284Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:42:52.0024986Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:52.0025322Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:52.0025747Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:52.0026295Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:52.0033514Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:42:52.0033930Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:42:52.0034281Z ++ sccache --zero-stats 2025-12-04T09:42:52.1156590Z Statistics zeroed. 2025-12-04T09:42:52.1161740Z ++ which ccache 2025-12-04T09:42:52.1188567Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *rocm* ]] 2025-12-04T09:42:52.1189107Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *s390x* ]] 2025-12-04T09:42:52.1189552Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:42:52.1192684Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:42:52.1210711Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:42:52.1211094Z + trap_add cleanup_workspace EXIT 2025-12-04T09:42:52.1211459Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:42:52.1211768Z + shift 2025-12-04T09:42:52.1212064Z + for trap_add_name in "$@" 2025-12-04T09:42:52.1218624Z +++ trap -p EXIT 2025-12-04T09:42:52.1222090Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:42:52.1222519Z sccache_epilogue'\'' EXIT' 2025-12-04T09:42:52.1222858Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:42:52.1223186Z sccache_epilogue' EXIT 2025-12-04T09:42:52.1223462Z +++ printf '%s\n' ' 2025-12-04T09:42:52.1223732Z sccache_epilogue' 2025-12-04T09:42:52.1224024Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:42:52.1224961Z + trap -- ' 2025-12-04T09:42:52.1225217Z sccache_epilogue 2025-12-04T09:42:52.1225513Z cleanup_workspace' EXIT 2025-12-04T09:42:52.1225855Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:42:52.8547125Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:42:52.8566235Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:42:52.8569512Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:42:53.3417933Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:53.3418719Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:42:53.3424603Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:42:53.3434706Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:42:53.3458132Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:42:53.3458824Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:53.3459847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:42:53.3460399Z + patch -p4 2025-12-04T09:42:53.3473845Z patching file cudadrv/driver.py 2025-12-04T09:42:53.3474220Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:42:53.3486402Z + popd 2025-12-04T09:42:53.3486656Z ~/workspace 2025-12-04T09:42:53.3486939Z + echo 'Environment variables:' 2025-12-04T09:42:53.3487259Z Environment variables: 2025-12-04T09:42:53.3487539Z + env 2025-12-04T09:42:53.3496959Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:42:53.3498011Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:42:53.3498533Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:53.3499271Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:42:53.3499612Z HOSTNAME=428ca50ff249 2025-12-04T09:42:53.3500283Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3501231Z GITHUB_ACTION=__run_3 2025-12-04T09:42:53.3501559Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:42:53.3501896Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:42:53.3502212Z TEST_CONFIG=legacy_nvidia_driver 2025-12-04T09:42:53.3502565Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:42:53.3502948Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:42:53.3503307Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:53.3503777Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:42:53.3504167Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:42:53.3504515Z GITHUB_REF_TYPE=branch 2025-12-04T09:42:53.3504871Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3505269Z XLA_CUDA= 2025-12-04T09:42:53.3505514Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:42:53.3505991Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:42:53.3506499Z *** 2025-12-04T09:42:53.3506744Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:42:53.3507060Z GITHUB_ACTIONS=true 2025-12-04T09:42:53.3507350Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:53.3507755Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:53.3508203Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3508656Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3509289Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:42:53.3509853Z UCC_HOME=/usr 2025-12-04T09:42:53.3510101Z VERBOSE_TEST_LOGS=False 2025-12-04T09:42:53.3510400Z GITHUB_REF=refs/heads/main 2025-12-04T09:42:53.3510706Z SHARD_NUMBER=4 2025-12-04T09:42:53.3510968Z GITHUB_REF_PROTECTED=true 2025-12-04T09:42:53.3511281Z HOME=/var/lib/jenkins 2025-12-04T09:42:53.3511601Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:42:53.3511970Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:42:53.3512368Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:42:53.3512767Z USE_SYSTEM_NCCL=1 2025-12-04T09:42:53.3513018Z NUM_TEST_SHARDS=5 2025-12-04T09:42:53.3513277Z UCX_HOME=/usr 2025-12-04T09:42:53.3513946Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3515144Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:53.3516290Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3517252Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:42:53.3517858Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:42:53.3518153Z DASHBOARD_TAG= 2025-12-04T09:42:53.3518419Z GITHUB_RUN_ID=19922826259 2025-12-04T09:42:53.3518721Z INSTALLED_OPENBLAS= 2025-12-04T09:42:53.3519427Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3520223Z GITHUB_ACTOR=huydhn 2025-12-04T09:42:53.3520486Z PR_NUMBER= 2025-12-04T09:42:53.3520726Z DESIRED_CUDA=12.4 2025-12-04T09:42:53.3521172Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:42:53.3521478Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:42:53.3521870Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:42:53.3522358Z TERM=vt100 2025-12-04T09:42:53.3522606Z INSTALLED_VISION=yes 2025-12-04T09:42:53.3522880Z BRANCH=main 2025-12-04T09:42:53.3523123Z SCCACHE_REGION=us-east-1 2025-12-04T09:42:53.3523439Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:42:53.3523767Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:42:53.3524057Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:42:53.3524782Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:42:53.3525469Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:42:53.3525867Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:42:53.3526297Z REENABLED_ISSUES= 2025-12-04T09:42:53.3526553Z DOCS= 2025-12-04T09:42:53.3526780Z SHLVL=1 2025-12-04T09:42:53.3526991Z MAX_JOBS=14 2025-12-04T09:42:53.3527243Z GITHUB_ACTOR_ID=475357 2025-12-04T09:42:53.3527648Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3528091Z GITHUB_REF_NAME=main 2025-12-04T09:42:53.3528537Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:42:53.3529035Z GITHUB_JOB=test 2025-12-04T09:42:53.3529283Z NO_TEST_TIMEOUT=False 2025-12-04T09:42:53.3529574Z TD_DISTRIBUTED=False 2025-12-04T09:42:53.3529872Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:42:53.3530228Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:42:53.3530526Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:42:53.3530831Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:42:53.3531761Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:42:53.3532711Z GITHUB_BASE_REF= 2025-12-04T09:42:53.3532974Z INSTALLED_ACL= 2025-12-04T09:42:53.3533512Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T09:42:53.3534124Z CI=true 2025-12-04T09:42:53.3534378Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:42:53.3534756Z RUST_LOG=sccache::server=error 2025-12-04T09:42:53.3535061Z JOB_ID=57119749427 2025-12-04T09:42:53.3535324Z GITHUB_HEAD_REF= 2025-12-04T09:42:53.3535585Z GITHUB_ACTION_REF= 2025-12-04T09:42:53.3535929Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:42:53.3536325Z TEST_SHOWLOCALS=False 2025-12-04T09:42:53.3536620Z GITHUB_WORKFLOW=periodic 2025-12-04T09:42:53.3536938Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:42:53.3537675Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3538431Z NO_TD=False 2025-12-04T09:42:53.3538694Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:42:53.3539034Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:42:53.3539400Z _=/usr/bin/env 2025-12-04T09:42:53.3539815Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:53.3540433Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:42:53.3649123Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:42:53.3649937Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:42:53.3650649Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:42:53.3651359Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:42:53.3651982Z + BUILD_DIR=build 2025-12-04T09:42:53.3652259Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:42:53.3652610Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:42:53.3652908Z + SHARD_NUMBER=4 2025-12-04T09:42:53.3653160Z + NUM_TEST_SHARDS=5 2025-12-04T09:42:53.3653468Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:42:53.3653831Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:42:53.3654141Z + export VALGRIND=ON 2025-12-04T09:42:53.3654421Z + VALGRIND=ON 2025-12-04T09:42:53.3654919Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *clang9* ]] 2025-12-04T09:42:53.3655391Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:42:53.3655775Z + detect_cuda_arch 2025-12-04T09:42:53.3656097Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:42:53.3656506Z + command -v nvidia-smi 2025-12-04T09:42:53.3656787Z /usr/bin/nvidia-smi 2025-12-04T09:42:53.3659227Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:42:53.3659927Z ++ tail -n 1 2025-12-04T09:42:53.3890022Z + TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:42:53.3890615Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:42:53.3891005Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *s390x* ]] 2025-12-04T09:42:53.3891413Z + [[ 0 == \1 ]] 2025-12-04T09:42:53.3891649Z + [[ True == \1 ]] 2025-12-04T09:42:53.3891981Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *bazel* ]] 2025-12-04T09:42:53.3894855Z ++ realpath build/custom_test_artifacts 2025-12-04T09:42:53.3925589Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:42:53.3926184Z + [[ -n '' ]] 2025-12-04T09:42:53.3926462Z + echo 'Environment variables' 2025-12-04T09:42:53.3926786Z Environment variables 2025-12-04T09:42:53.3927069Z + env 2025-12-04T09:42:53.3951162Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:42:53.3951973Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:42:53.3952381Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T09:42:53.3953625Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:42:53.3954136Z HOSTNAME=428ca50ff249 2025-12-04T09:42:53.3954847Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3956075Z GITHUB_ACTION=__run_3 2025-12-04T09:42:53.3956386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:42:53.3956741Z GITHUB_RUN_NUMBER=19107 2025-12-04T09:42:53.3957038Z TEST_CONFIG=legacy_nvidia_driver 2025-12-04T09:42:53.3957394Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:42:53.3957785Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:42:53.3958142Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:42:53.3958603Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:42:53.3958946Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:42:53.3959275Z GITHUB_REF_TYPE=branch 2025-12-04T09:42:53.3959780Z TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:42:53.3960415Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3961140Z XLA_CUDA= 2025-12-04T09:42:53.3961599Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:42:53.3962680Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:42:53.3963330Z *** 2025-12-04T09:42:53.3963562Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:42:53.3963892Z GITHUB_ACTIONS=true 2025-12-04T09:42:53.3964185Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:42:53.3964629Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:42:53.3965094Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3965531Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3966168Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main 2025-12-04T09:42:53.3966739Z UCC_HOME=/usr 2025-12-04T09:42:53.3966994Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:42:53.3967313Z VERBOSE_TEST_LOGS=False 2025-12-04T09:42:53.3967614Z GITHUB_REF=refs/heads/main 2025-12-04T09:42:53.3967898Z SHARD_NUMBER=4 2025-12-04T09:42:53.3968164Z GITHUB_REF_PROTECTED=true 2025-12-04T09:42:53.3968465Z HOME=/var/lib/jenkins 2025-12-04T09:42:53.3968774Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:42:53.3969169Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:42:53.3969575Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:42:53.3969960Z USE_SYSTEM_NCCL=1 2025-12-04T09:42:53.3970226Z NUM_TEST_SHARDS=5 2025-12-04T09:42:53.3970485Z UCX_HOME=/usr 2025-12-04T09:42:53.3971154Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3972579Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T09:42:53.3973761Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3974738Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:42:53.3975354Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:42:53.3975652Z DASHBOARD_TAG= 2025-12-04T09:42:53.3975930Z GITHUB_RUN_ID=19922826259 2025-12-04T09:42:53.3976352Z INSTALLED_OPENBLAS= 2025-12-04T09:42:53.3977084Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3977906Z GITHUB_ACTOR=huydhn 2025-12-04T09:42:53.3978182Z PR_NUMBER= 2025-12-04T09:42:53.3978417Z DESIRED_CUDA=12.4 2025-12-04T09:42:53.3978689Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:42:53.3978968Z VALGRIND=ON 2025-12-04T09:42:53.3979221Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:42:53.3979623Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:42:53.3980042Z TERM=vt100 2025-12-04T09:42:53.3980288Z INSTALLED_VISION=yes 2025-12-04T09:42:53.3980574Z BRANCH=main 2025-12-04T09:42:53.3980827Z SCCACHE_REGION=us-east-1 2025-12-04T09:42:53.3981132Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:42:53.3981467Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:42:53.3981780Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:42:53.3982395Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:42:53.3983086Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:42:53.3983518Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:42:53.3983922Z REENABLED_ISSUES= 2025-12-04T09:42:53.3984171Z DOCS= 2025-12-04T09:42:53.3984395Z SHLVL=1 2025-12-04T09:42:53.3984625Z MAX_JOBS=14 2025-12-04T09:42:53.3984864Z GITHUB_ACTOR_ID=475357 2025-12-04T09:42:53.3985265Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:42:53.3985730Z GITHUB_REF_NAME=main 2025-12-04T09:42:53.3986161Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:42:53.3986670Z GITHUB_JOB=test 2025-12-04T09:42:53.3986933Z NO_TEST_TIMEOUT=False 2025-12-04T09:42:53.3987206Z TD_DISTRIBUTED=False 2025-12-04T09:42:53.3987507Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:42:53.3987855Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:42:53.3988144Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:42:53.3988458Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:42:53.3989391Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:42:53.3990372Z GITHUB_BASE_REF= 2025-12-04T09:42:53.3990630Z INSTALLED_ACL= 2025-12-04T09:42:53.3991173Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T09:42:53.3991794Z CI=true 2025-12-04T09:42:53.3992040Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:42:53.3992417Z RUST_LOG=sccache::server=error 2025-12-04T09:42:53.3992733Z JOB_ID=57119749427 2025-12-04T09:42:53.3992986Z GITHUB_HEAD_REF= 2025-12-04T09:42:53.3993249Z GITHUB_ACTION_REF= 2025-12-04T09:42:53.3993587Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:42:53.3993986Z TEST_SHOWLOCALS=False 2025-12-04T09:42:53.3994281Z GITHUB_WORKFLOW=periodic 2025-12-04T09:42:53.3994592Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:42:53.3995329Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_12f782ac-3486-4605-947a-3e1e053e632a 2025-12-04T09:42:53.3996077Z NO_TD=False 2025-12-04T09:42:53.3996346Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:42:53.3996700Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:42:53.3997221Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:42:53.3997729Z _=/usr/bin/env 2025-12-04T09:42:53.3997995Z + echo 'Testing pytorch' 2025-12-04T09:42:53.3998281Z Testing pytorch 2025-12-04T09:42:53.3998659Z + export LANG=C.UTF-8 2025-12-04T09:42:53.3998950Z + LANG=C.UTF-8 2025-12-04T09:42:53.3999191Z + PR_NUMBER= 2025-12-04T09:42:53.3999482Z + [[ legacy_nvidia_driver == \d\e\f\a\u\l\t ]] 2025-12-04T09:42:53.3999918Z + [[ legacy_nvidia_driver == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:42:53.4000329Z + [[ legacy_nvidia_driver == \s\l\o\w ]] 2025-12-04T09:42:53.4000795Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *slow-gradcheck* ]] 2025-12-04T09:42:53.4001575Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:42:53.4002230Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:42:53.4002626Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:42:53.4003007Z + [[ legacy_nvidia_driver == *crossref* ]] 2025-12-04T09:42:53.4003428Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:42:53.4003864Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:42:53.4004327Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:42:53.4004747Z + pip_install ninja==1.10.2 2025-12-04T09:42:53.4005168Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:42:53.4005716Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:42:53.8328753Z Collecting ninja==1.10.2 2025-12-04T09:42:53.8598345Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:42:53.8713637Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:42:54.2993211Z Installing collected packages: ninja 2025-12-04T09:42:54.2993707Z Attempting uninstall: ninja 2025-12-04T09:42:54.3001934Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:42:54.3026174Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:42:54.3094528Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:42:54.3475943Z Successfully installed ninja-1.10.2 2025-12-04T09:42:54.4145433Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:42:54.4147396Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:42:54.4148709Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:42:54.4149201Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *asan* ]] 2025-12-04T09:42:54.4149774Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-debug* ]] 2025-12-04T09:42:54.4150260Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:42:54.4150917Z + echo 'We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass' 2025-12-04T09:42:54.4151748Z We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass 2025-12-04T09:42:54.4152322Z + cd test 2025-12-04T09:42:54.4152721Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:42:56.1655108Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:42:56.1655654Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:42:56.1656189Z + [[ legacy_nvidia_driver == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:42:56.1657424Z + cd test 2025-12-04T09:42:56.1658523Z + python -c 'import torch; torch.rand(2, 2, device='\''cuda'\'')' 2025-12-04T09:43:01.0771033Z + export USE_LEGACY_DRIVER=1 2025-12-04T09:43:01.0771446Z + USE_LEGACY_DRIVER=1 2025-12-04T09:43:01.0777394Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:43:01.0778560Z + [[ legacy_nvidia_driver == *pr_time_benchmarks* ]] 2025-12-04T09:43:01.0779003Z + [[ legacy_nvidia_driver == *dynamo_eager* ]] 2025-12-04T09:43:01.0779414Z + [[ legacy_nvidia_driver == *aot_eager* ]] 2025-12-04T09:43:01.0779834Z + [[ legacy_nvidia_driver == *aot_inductor* ]] 2025-12-04T09:43:01.0780255Z + [[ legacy_nvidia_driver == *max_autotune_inductor* ]] 2025-12-04T09:43:01.0780954Z + [[ legacy_nvidia_driver == *inductor* ]] 2025-12-04T09:43:01.0781355Z + [[ legacy_nvidia_driver == *dynamic* ]] 2025-12-04T09:43:01.0781736Z + [[ legacy_nvidia_driver == *cpu* ]] 2025-12-04T09:43:01.0782084Z + [[ legacy_nvidia_driver == *xpu* ]] 2025-12-04T09:43:01.0782474Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:43:01.0816449Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:43:01.0816931Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-bazel-* ]] 2025-12-04T09:43:01.0820122Z + cd test 2025-12-04T09:43:01.0821055Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:43:03.8849646Z PyTorch built with: 2025-12-04T09:43:03.8849994Z - GCC 11.4 2025-12-04T09:43:03.8850306Z - C++ Version: 201703 2025-12-04T09:43:03.8851050Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:43:03.8851976Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:43:03.8852581Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:43:03.8852983Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:43:03.8853440Z - NNPACK is enabled 2025-12-04T09:43:03.8853736Z - CPU capability usage: AVX512 2025-12-04T09:43:03.8854131Z - CUDA Runtime 12.4 2025-12-04T09:43:03.8854536Z - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 2025-12-04T09:43:03.8855051Z - CuDNN 90.1 2025-12-04T09:43:03.8861388Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:43:03.8868119Z 2025-12-04T09:43:04.2934604Z + cd test 2025-12-04T09:43:04.2935059Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:43:05.7493155Z ATen/Parallel: 2025-12-04T09:43:05.7493524Z at::get_num_threads() : 8 2025-12-04T09:43:05.7493879Z at::get_num_interop_threads() : 8 2025-12-04T09:43:05.7494239Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:43:05.7494600Z omp_get_max_threads() : 8 2025-12-04T09:43:05.7495273Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:43:05.7495978Z mkl_get_max_threads() : 8 2025-12-04T09:43:05.7496425Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:43:05.7496943Z std::thread::hardware_concurrency() : 16 2025-12-04T09:43:05.7497321Z Environment variables: 2025-12-04T09:43:05.7497620Z OMP_NUM_THREADS : [not set] 2025-12-04T09:43:05.7497945Z MKL_NUM_THREADS : [not set] 2025-12-04T09:43:05.7498270Z ATen parallel backend: OpenMP 2025-12-04T09:43:05.7498485Z 2025-12-04T09:43:06.0700478Z + [[ legacy_nvidia_driver == *numpy_2* ]] 2025-12-04T09:43:06.0701322Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:43:06.0701780Z + [[ legacy_nvidia_driver == *backward* ]] 2025-12-04T09:43:06.0702235Z + [[ legacy_nvidia_driver == *libtorch_agnostic_targetting* ]] 2025-12-04T09:43:06.0703025Z + [[ legacy_nvidia_driver == *xla* ]] 2025-12-04T09:43:06.0703403Z + [[ legacy_nvidia_driver == *vllm* ]] 2025-12-04T09:43:06.0703789Z + [[ legacy_nvidia_driver == *executorch* ]] 2025-12-04T09:43:06.0704199Z + [[ legacy_nvidia_driver == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:43:06.0704653Z + [[ legacy_nvidia_driver == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:43:06.0705133Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:43:06.0705565Z + [[ legacy_nvidia_driver == distributed ]] 2025-12-04T09:43:06.0706187Z + [[ legacy_nvidia_driver == *operator_benchmark* ]] 2025-12-04T09:43:06.0706823Z + [[ legacy_nvidia_driver == *operator_microbenchmark* ]] 2025-12-04T09:43:06.0707313Z + [[ legacy_nvidia_driver == *attention_microbenchmark* ]] 2025-12-04T09:43:06.0707809Z + [[ legacy_nvidia_driver == *inductor_distributed* ]] 2025-12-04T09:43:06.0708261Z + [[ legacy_nvidia_driver == *inductor-halide* ]] 2025-12-04T09:43:06.0708705Z + [[ legacy_nvidia_driver == *inductor-pallas* ]] 2025-12-04T09:43:06.0709165Z + [[ legacy_nvidia_driver == *inductor-triton-cpu* ]] 2025-12-04T09:43:06.0709655Z + [[ legacy_nvidia_driver == *inductor-micro-benchmark* ]] 2025-12-04T09:43:06.0710188Z + [[ legacy_nvidia_driver == *aoti_cross_compile_for_windows* ]] 2025-12-04T09:43:06.0710657Z + [[ legacy_nvidia_driver == *huggingface* ]] 2025-12-04T09:43:06.0711044Z + [[ legacy_nvidia_driver == *timm* ]] 2025-12-04T09:43:06.0711418Z + [[ legacy_nvidia_driver == cachebench ]] 2025-12-04T09:43:06.0711809Z + [[ legacy_nvidia_driver == verify_cachebench ]] 2025-12-04T09:43:06.0712231Z + [[ legacy_nvidia_driver == *torchbench* ]] 2025-12-04T09:43:06.0712659Z + [[ legacy_nvidia_driver == *inductor_cpp_wrapper* ]] 2025-12-04T09:43:06.0713103Z + [[ legacy_nvidia_driver == *inductor_core* ]] 2025-12-04T09:43:06.0713487Z + [[ legacy_nvidia_driver == *inductor* ]] 2025-12-04T09:43:06.0713868Z + [[ legacy_nvidia_driver == *einops* ]] 2025-12-04T09:43:06.0714257Z + [[ legacy_nvidia_driver == *dynamo_core* ]] 2025-12-04T09:43:06.0714660Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]] 2025-12-04T09:43:06.0715100Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:43:06.0715486Z + [[ 4 == 1 ]] 2025-12-04T09:43:06.0715717Z + [[ 4 == 2 ]] 2025-12-04T09:43:06.0715970Z + [[ 4 -gt 2 ]] 2025-12-04T09:43:06.0716234Z + install_torchvision 2025-12-04T09:43:06.0716520Z + local orig_preload 2025-12-04T09:43:06.0716802Z + local commit 2025-12-04T09:43:06.0717066Z ++ get_pinned_commit vision 2025-12-04T09:43:06.0717392Z ++ cat .github/ci_commit_pins/vision.txt 2025-12-04T09:43:06.0721360Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:06.0721881Z + orig_preload= 2025-12-04T09:43:06.0722349Z + '[' -n '' ']' 2025-12-04T09:43:06.0722765Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:43:06.0723176Z + export FORCE_CUDA=1 2025-12-04T09:43:06.0723460Z + FORCE_CUDA=1 2025-12-04T09:43:06.0723702Z + export WITH_CUDA=1 2025-12-04T09:43:06.0723985Z + WITH_CUDA=1 2025-12-04T09:43:06.0724658Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision 2025-12-04T09:43:06.0725715Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:06.0726375Z + local wheel_dir=dist/vision 2025-12-04T09:43:06.0726699Z + local found_whl=0 2025-12-04T09:43:06.0726990Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:43:06.0727327Z + [[ -f dist/vision/*.whl ]] 2025-12-04T09:43:06.0727627Z + '[' 0 == 0 ']' 2025-12-04T09:43:06.0728435Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:06.4383413Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:06.4388710Z Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-tejf4bas 2025-12-04T09:43:06.4569215Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-tejf4bas 2025-12-04T09:43:08.1510177Z Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e' 2025-12-04T09:43:08.1536390Z Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:08.2660297Z Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:43:11.6868084Z Preparing metadata (pyproject.toml) ... [?25l- \ | done 2025-12-04T09:43:11.6906792Z [?25hBuilding wheels for collected packages: torchvision 2025-12-04T09:44:42.1779567Z Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-12-04T09:44:42.1845650Z [?25h Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1821672 sha256=2ba3e74afda71e3592904b780596e0d10594a173250b8abb15e1f83b61107b7c 2025-12-04T09:44:42.1848082Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac 2025-12-04T09:44:42.1889798Z Successfully built torchvision 2025-12-04T09:44:42.2796173Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:44:42.2796881Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:44:42.2797723Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl') 2025-12-04T09:44:42.2798279Z + local args 2025-12-04T09:44:42.2798744Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]] 2025-12-04T09:44:42.2799323Z + for path in "${args[@]}" 2025-12-04T09:44:42.2799877Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl' 2025-12-04T09:44:42.2800694Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:44:42.2801785Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:44:42.6472220Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl 2025-12-04T09:44:42.6612264Z Installing collected packages: torchvision 2025-12-04T09:44:43.1845660Z Successfully installed torchvision-0.25.0a0+617079d 2025-12-04T09:44:43.2300240Z + '[' -n '' ']' 2025-12-04T09:44:43.2300559Z + test_python_shard 4 2025-12-04T09:44:43.2301042Z + [[ -z 5 ]] 2025-12-04T09:44:43.2302030Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 4 5 --verbose --upload-artifacts-while-running 2025-12-04T09:44:50.1796820Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:44:50.2256507Z Ignoring disabled issues: [''] 2025-12-04T09:44:50.2371562Z Found test times from artifacts 2025-12-04T09:44:50.2817382Z Found test times from artifacts 2025-12-04T09:44:50.2833235Z Running all tests 2025-12-04T09:44:50.3740250Z Running parallel tests on 1 processes 2025-12-04T09:44:50.3752201Z Name: tests to run (est. time: 289.55min) 2025-12-04T09:44:50.3752598Z Serial tests (139): 2025-12-04T09:44:50.3752911Z inductor/test_aot_inductor 4/6 2025-12-04T09:44:50.3753373Z inductor/test_torchinductor_dynamic_shapes 1/5 2025-12-04T09:44:50.3753861Z inductor/test_torchinductor_dynamic_shapes 5/5 2025-12-04T09:44:50.3754271Z inductor/test_kernel_benchmark 1/1 2025-12-04T09:44:50.3754664Z inductor/test_torchinductor_opinfo 3/17 2025-12-04T09:44:50.3755073Z inductor/test_torchinductor_opinfo 8/17 2025-12-04T09:44:50.3755469Z inductor/test_torchinductor_opinfo 13/17 2025-12-04T09:44:50.3755866Z inductor/test_pattern_matcher 1/1 2025-12-04T09:44:50.3756231Z inductor/test_cuda_repro 1/1 2025-12-04T09:44:50.3756856Z inductor/test_cudagraph_trees 1/1 2025-12-04T09:44:50.3757244Z inductor/test_cuda_select_algorithm 4/5 2025-12-04T09:44:50.3757635Z inductor/test_deterministic 1/8 2025-12-04T09:44:50.3757981Z inductor/test_deterministic 6/8 2025-12-04T09:44:50.3758346Z inductor/test_extension_backend 1/1 2025-12-04T09:44:50.3758723Z inductor/test_native_matmul 1/2 2025-12-04T09:44:50.3759084Z dynamo/test_fx_graph_runnable 1/1 2025-12-04T09:44:50.3759430Z inductor/test_memory 1/1 2025-12-04T09:44:50.3759904Z dynamo/test_streams 1/1 2025-12-04T09:44:50.3760235Z inductor/test_unbacked_symints 1/1 2025-12-04T09:44:50.3760611Z inductor/test_scatter_optimization 1/1 2025-12-04T09:44:50.3761004Z inductor/test_mix_order_reduction 1/2 2025-12-04T09:44:50.3761374Z test_transformers 1/1 2025-12-04T09:44:50.3761663Z test_autograd 1/1 2025-12-04T09:44:50.3762019Z test_sparse 1/2 2025-12-04T09:44:50.3762287Z test_decomp 2/17 2025-12-04T09:44:50.3762555Z test_decomp 7/17 2025-12-04T09:44:50.3762834Z test_decomp 12/17 2025-12-04T09:44:50.3763113Z test_decomp 17/17 2025-12-04T09:44:50.3763377Z test_meta 5/5 2025-12-04T09:44:50.3763650Z test_nestedtensor 1/4 2025-12-04T09:44:50.3763963Z test_nestedtensor 4/4 2025-12-04T09:44:50.3764247Z test_ops 5/11 2025-12-04T09:44:50.3764515Z test_ops 10/11 2025-12-04T09:44:50.3764795Z functorch/test_ops 2/7 2025-12-04T09:44:50.3765100Z functorch/test_ops 7/7 2025-12-04T09:44:50.3765428Z inductor/test_max_autotune 1/1 2025-12-04T09:44:50.3765788Z inductor/test_cpu_repro 3/3 2025-12-04T09:44:50.3766146Z inductor/test_mkldnn_pattern_matcher 2/3 2025-12-04T09:44:50.3766534Z inductor/test_cpu_select_algorithm 1/1 2025-12-04T09:44:50.3766901Z test_custom_ops 1/1 2025-12-04T09:44:50.3767200Z inductor/test_analysis 1/1 2025-12-04T09:44:50.3767519Z inductor/test_pad_mm 1/1 2025-12-04T09:44:50.3767848Z inductor/test_triton_syntax 1/1 2025-12-04T09:44:50.3768233Z inductor/test_triton_extension_backend 1/1 2025-12-04T09:44:50.3768622Z test_sparse_semi_structured 1/1 2025-12-04T09:44:50.3768987Z inductor/test_op_completeness 1/1 2025-12-04T09:44:50.3769359Z inductor/test_subgraph_choice 1/1 2025-12-04T09:44:50.3769720Z inductor/test_cutedsl_grouped_mm 1/1 2025-12-04T09:44:50.3770106Z inductor/test_cpp_wrapper_hipify 1/1 2025-12-04T09:44:50.3770487Z inductor/test_inductor_utils 1/1 2025-12-04T09:44:50.3770878Z inductor/test_template_heuristics_registry 1/1 2025-12-04T09:44:50.3771301Z inductor/test_async_compile 1/1 2025-12-04T09:44:50.3771662Z dynamo/test_deque_reconstruct 1/1 2025-12-04T09:44:50.3772020Z inductor/test_utils 1/1 2025-12-04T09:44:50.3772327Z inductor/test_indexing 1/1 2025-12-04T09:44:50.3772676Z inductor/test_inductor_annotations 1/1 2025-12-04T09:44:50.3773059Z inductor/test_compile_worker 1/1 2025-12-04T09:44:50.3773398Z dynamo/test_einops 1/1 2025-12-04T09:44:50.3773731Z inductor/test_external_callables 1/1 2025-12-04T09:44:50.3774092Z test_testing 1/1 2025-12-04T09:44:50.3774378Z dynamo/test_fx_passes_pre_grad 1/1 2025-12-04T09:44:50.3774748Z export/test_strict_export_v2 1/1 2025-12-04T09:44:50.3775138Z export/test_functionalized_assertions 1/1 2025-12-04T09:44:50.3775529Z inductor/test_selective_lowering 1/1 2025-12-04T09:44:50.3775907Z dynamo/test_base_output 1/1 2025-12-04T09:44:50.3776248Z inductor/test_lookup_table 1/1 2025-12-04T09:44:50.3776602Z export/test_serialize 1/1 2025-12-04T09:44:50.3776960Z inductor/test_move_constructors_to_gpu 1/1 2025-12-04T09:44:50.3777353Z inductor/test_remote_cache 1/1 2025-12-04T09:44:50.3777726Z inductor/test_coordinate_descent_tuner 1/1 2025-12-04T09:44:50.3778110Z inductor/test_inplace_padding 1/1 2025-12-04T09:44:50.3778479Z inductor/test_cudacodecache 1/1 2025-12-04T09:44:50.3778841Z inductor/test_minifier_utils 1/1 2025-12-04T09:44:50.3779183Z inductor/test_debug_trace 1/1 2025-12-04T09:44:50.3779644Z inductor/test_foreach 1/1 2025-12-04T09:44:50.3779975Z inductor/test_cache 1/1 2025-12-04T09:44:50.3780278Z dynamo/test_config 1/1 2025-12-04T09:44:50.3780609Z dynamo/test_metrics_context 1/1 2025-12-04T09:44:50.3780965Z export/test_package 1/1 2025-12-04T09:44:50.3781270Z dynamo/test_nops 1/1 2025-12-04T09:44:50.3781645Z inductor/test_graph_transform_observer 1/1 2025-12-04T09:44:50.3782130Z export/test_db 1/1 2025-12-04T09:44:50.3782477Z dynamo/test_export_mutations 1/1 2025-12-04T09:44:50.3782971Z inductor/test_config 1/1 2025-12-04T09:44:50.3783311Z inductor/test_dependencies 1/1 2025-12-04T09:44:50.3783665Z inductor/test_fuzzer 1/1 2025-12-04T09:44:50.3783974Z dynamo/test_global 1/1 2025-12-04T09:44:50.3784296Z inductor/test_control_flow 1/4 2025-12-04T09:44:50.3784650Z dynamo/test_cudagraphs 1/1 2025-12-04T09:44:50.3784974Z inductor/test_alignment 1/1 2025-12-04T09:44:50.3785313Z dynamo/test_profiler 1/1 2025-12-04T09:44:50.3785664Z dynamo/test_guard_serialization 1/1 2025-12-04T09:44:50.3786019Z dynamo/test_dicts 1/1 2025-12-04T09:44:50.3786336Z dynamo/test_optimizers 1/1 2025-12-04T09:44:50.3786670Z export/test_torchbind 1/1 2025-12-04T09:44:50.3787007Z dynamo/test_python_dispatcher 1/1 2025-12-04T09:44:50.3787365Z export/test_swap 1/1 2025-12-04T09:44:50.3787672Z export/test_unflatten 1/1 2025-12-04T09:44:50.3788001Z dynamo/test_verify_correctness 1/1 2025-12-04T09:44:50.3788375Z inductor/test_fxir_backend 1/1 2025-12-04T09:44:50.3788740Z dynamo/test_structured_trace 1/1 2025-12-04T09:44:50.3789101Z dynamo/test_torchrec 1/1 2025-12-04T09:44:50.3789423Z test_model_exports_to_core_aten 1/1 2025-12-04T09:44:50.3789800Z dynamo/test_precompile_context 1/1 2025-12-04T09:44:50.3790168Z dynamo/test_trace_rules 1/1 2025-12-04T09:44:50.3790482Z export/test_upgrader 1/1 2025-12-04T09:44:50.3790797Z dynamo/test_hooks 1/1 2025-12-04T09:44:50.3791107Z dynamo/test_generator 1/1 2025-12-04T09:44:50.3791419Z export/test_verifier 1/1 2025-12-04T09:44:50.3791739Z export/test_sparse 2/2 2025-12-04T09:44:50.3792054Z functorch/test_ac 1/1 2025-12-04T09:44:50.3792346Z test_out_dtype_op 1/1 2025-12-04T09:44:50.3792664Z torch_np/test_ufuncs_basic 1/1 2025-12-04T09:44:50.3793023Z lazy/test_step_closures 1/1 2025-12-04T09:44:50.3793388Z functorch/dim/test_getsetitem 1/1 2025-12-04T09:44:50.3793872Z test_fx 1/1 2025-12-04T09:44:50.3794137Z test_autocast 1/1 2025-12-04T09:44:50.3794421Z test_logging 1/1 2025-12-04T09:44:50.3794693Z test_python_dispatch 1/1 2025-12-04T09:44:50.3795011Z nn/test_lazy_modules 1/1 2025-12-04T09:44:50.3795321Z nn/test_pruning 1/1 2025-12-04T09:44:50.3795591Z test_monitor 1/1 2025-12-04T09:44:50.3795873Z test_cuda_sanitizer 1/1 2025-12-04T09:44:50.3796191Z test_bundled_inputs 1/1 2025-12-04T09:44:50.3796524Z torch_np/numpy_tests/core/test_numeric 1/1 2025-12-04T09:44:50.3796959Z torch_np/numpy_tests/core/test_multiarray 1/1 2025-12-04T09:44:50.3797349Z test_itt 1/1 2025-12-04T09:44:50.3797657Z torch_np/numpy_tests/lib/test_function_base 1/1 2025-12-04T09:44:50.3798056Z test_masked 1/1 2025-12-04T09:44:50.3798337Z optim/test_lrscheduler 1/1 2025-12-04T09:44:50.3798647Z test_datapipe 1/1 2025-12-04T09:44:50.3798938Z nn/test_convolution 1/1 2025-12-04T09:44:50.3799244Z test_indexing 1/1 2025-12-04T09:44:50.3799552Z torch_np/numpy_tests/fft/test_pocketfft 1/1 2025-12-04T09:44:50.3799993Z torch_np/numpy_tests/lib/test_shape_base_ 1/1 2025-12-04T09:44:50.3800400Z test_cpp_extensions_jit 1/1 2025-12-04T09:44:50.3800746Z profiler/test_python_tracer 1/1 2025-12-04T09:44:50.3801638Z cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 2025-12-04T09:44:50.3802303Z distributions/test_distributions 1/1 2025-12-04T09:44:50.3802682Z Parallel tests (0): 2025-12-04T09:44:50.3802974Z Name: excluded (est. time: 0.0min) 2025-12-04T09:44:50.3803478Z Serial tests (0): 2025-12-04T09:44:50.3803757Z Parallel tests (0): 2025-12-04T09:44:50.3804242Z Running inductor/test_aot_inductor 4/6 ... [2025-12-04 09:44:50.376192][1847.986095395] 2025-12-04T09:44:50.3804818Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:44:50.3806093Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=4', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:44:50.376617] 2025-12-04T09:53:09.3152687Z 2025-12-04T09:53:09.3153576Z PRINTING LOG FILE of inductor/test_aot_inductor 4/6 (test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log) 2025-12-04T09:53:09.3154711Z W1204 09:45:03.012000 1725 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.3155905Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml 2025-12-04T09:53:09.3156801Z ============================= test session starts ============================== 2025-12-04T09:53:09.3157468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:53:09.3158076Z cachedir: .pytest_cache 2025-12-04T09:53:09.3158800Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:53:09.3159665Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:53:09.3160045Z configfile: pytest.ini 2025-12-04T09:53:09.3160869Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:53:09.3161771Z collecting ... collected 934 items 2025-12-04T09:53:09.3162245Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:53:09.3250342Z Running 152 items in this shard: test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicate_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_on_disk_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps 2025-12-04T09:53:09.3337086Z 2025-12-04T09:53:09.3338302Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp W1204 09:45:04.906000 1725 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.link_libtorch=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T09:53:09.3340340Z W1204 09:45:04.906000 1725 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True. 2025-12-04T09:53:09.3341313Z PASSED [0.0050s] [ 0%] 2025-12-04T09:53:09.3342255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu <- test/inductor/test_torchinductor.py PASSED [18.5029s] [ 1%] 2025-12-04T09:53:09.3343661Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu PASSED [0.0066s] [ 1%] 2025-12-04T09:53:09.3345149Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 2%] 2025-12-04T09:53:09.3347363Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu W1204 09:45:23.438000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3349512Z W1204 09:45:23.438000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3350410Z PASSED [5.5119s] [ 3%] 2025-12-04T09:53:09.3351304Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu <- test/inductor/test_torchinductor.py PASSED [5.4373s] [ 3%] 2025-12-04T09:53:09.3353408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu W1204 09:45:34.390000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3355534Z W1204 09:45:34.391000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3356979Z W1204 09:45:34.391000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3357881Z PASSED [5.6927s] [ 4%] 2025-12-04T09:53:09.3359471Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu <- test/inductor/test_torchinductor.py W1204 09:45:40.158000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3361673Z W1204 09:45:40.158000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3363205Z W1204 09:45:40.159000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3364096Z PASSED [5.5548s] [ 5%] 2025-12-04T09:53:09.3365732Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu <- test/inductor/test_torchinductor.py W1204 09:45:45.760000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3367965Z W1204 09:45:45.760000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3368872Z PASSED [5.8295s] [ 5%] 2025-12-04T09:53:09.3369728Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu <- test/inductor/test_torchinductor.py PASSED [7.0993s] [ 6%] 2025-12-04T09:53:09.3371194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu <- test/inductor/test_torchinductor.py PASSED [5.0658s] [ 7%] 2025-12-04T09:53:09.3372500Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu PASSED [5.1488s] [ 7%] 2025-12-04T09:53:09.3373924Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ devices) [ 8%] 2025-12-04T09:53:09.3375444Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu <- test/inductor/test_torchinductor.py PASSED [5.1422s] [ 9%] 2025-12-04T09:53:09.3376963Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu <- test/inductor/test_torchinductor.py PASSED [0.0282s] [ 9%] 2025-12-04T09:53:09.3378660Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (requires GPU) [ 10%] 2025-12-04T09:53:09.3380095Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu PASSED [8.1427s] [ 11%] 2025-12-04T09:53:09.3381443Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [ 11%] 2025-12-04T09:53:09.3383225Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu SKIPPED [0.0002s] (Skipping triton backend only since not big GPU (not enough SM)) [ 12%] 2025-12-04T09:53:09.3385270Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu <- test/inductor/test_torchinductor.py W1204 09:46:22.519000 1725 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T09:53:09.3386555Z PASSED [10.1890s] [ 13%] 2025-12-04T09:53:09.3387821Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu <- test/inductor/test_torchinductor.py W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T09:53:09.3389436Z W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T09:53:09.3390293Z W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T09:53:09.3392015Z W1204 09:46:37.584000 1725 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T09:53:09.3394647Z W1204 09:46:37.585000 1725 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-12-04T09:53:09.3396164Z PASSED [8.7186s] [ 13%] 2025-12-04T09:53:09.3397022Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 14%] 2025-12-04T09:53:09.3398540Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu <- test/inductor/test_torchinductor.py PASSED [4.9339s] [ 15%] 2025-12-04T09:53:09.3400146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu <- test/inductor/test_torchinductor.py PASSED [5.0270s] [ 15%] 2025-12-04T09:53:09.3401811Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu <- test/inductor/test_torchinductor.py PASSED [4.9238s] [ 16%] 2025-12-04T09:53:09.3403366Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu <- test/inductor/test_torchinductor.py PASSED [4.9819s] [ 17%] 2025-12-04T09:53:09.3404895Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu <- test/inductor/test_torchinductor.py PASSED [4.9382s] [ 17%] 2025-12-04T09:53:09.3406431Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.4949s] [ 18%] 2025-12-04T09:53:09.3407946Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.1186s] [ 19%] 2025-12-04T09:53:09.3409244Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu PASSED [5.0945s] [ 19%] 2025-12-04T09:53:09.3410512Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu <- test/inductor/test_torchinductor.py PASSED [5.0660s] [ 20%] 2025-12-04T09:53:09.3411976Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu <- test/inductor/test_torchinductor.py PASSED [5.0445s] [ 21%] 2025-12-04T09:53:09.3413492Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu SKIPPED [0.0031s] (requires GPU) [ 21%] 2025-12-04T09:53:09.3415133Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu SKIPPED [0.0030s] (requires GPU) [ 22%] 2025-12-04T09:53:09.3416944Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu SKIPPED [0.0030s] (requires GPU) [ 23%] 2025-12-04T09:53:09.3418672Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu SKIPPED [0.0028s] (requires GPU) [ 23%] 2025-12-04T09:53:09.3420372Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 24%] 2025-12-04T09:53:09.3422156Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu SKIPPED [0.0028s] (requires GPU) [ 25%] 2025-12-04T09:53:09.3423856Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 25%] 2025-12-04T09:53:09.3425578Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 26%] 2025-12-04T09:53:09.3427255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu SKIPPED [0.0027s] (requires GPU) [ 26%] 2025-12-04T09:53:09.3428923Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu SKIPPED [0.0027s] (requires GPU) [ 27%] 2025-12-04T09:53:09.3430618Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu SKIPPED [0.0030s] (requires GPU) [ 28%] 2025-12-04T09:53:09.3432310Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu SKIPPED [0.0027s] (requires GPU) [ 28%] 2025-12-04T09:53:09.3434026Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu SKIPPED [0.0027s] (requires GPU) [ 29%] 2025-12-04T09:53:09.3435785Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu Error: Expected u1 >= 1 but received 0 2025-12-04T09:53:09.3436829Z PASSED [10.3166s] [ 30%] 2025-12-04T09:53:09.3437914Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu SKIPPED [0.0031s] (Need triton for user-defined triton kernel) [ 30%] 2025-12-04T09:53:09.3439835Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu SKIPPED [0.0029s] (Need triton for user-defined triton kernel) [ 31%] 2025-12-04T09:53:09.3441762Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu SKIPPED [0.0028s] (Need triton for user-defined triton kernel) [ 32%] 2025-12-04T09:53:09.3443581Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu <- test/inductor/test_torchinductor.py PASSED [5.0463s] [ 32%] 2025-12-04T09:53:09.3445143Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu <- test/inductor/test_torchinductor.py PASSED [5.2486s] [ 33%] 2025-12-04T09:53:09.3446603Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu PASSED [5.7203s] [ 34%] 2025-12-04T09:53:09.3448596Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu W1204 09:47:58.251000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3450731Z W1204 09:47:58.251000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3451639Z PASSED [5.7703s] [ 34%] 2025-12-04T09:53:09.3452590Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda <- test/inductor/test_torchinductor.py PASSED [6.7599s] [ 35%] 2025-12-04T09:53:09.3454193Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda <- test/inductor/test_torchinductor.py PASSED [11.5150s] [ 36%] 2025-12-04T09:53:09.3455893Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda <- test/inductor/test_torchinductor.py PASSED [6.3204s] [ 36%] 2025-12-04T09:53:09.3457455Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicate_cuda <- test/inductor/test_torchinductor.py PASSED [6.2938s] [ 37%] 2025-12-04T09:53:09.3459902Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda <- test/inductor/test_torchinductor.py W1204 09:48:34.826000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3462014Z W1204 09:48:34.826000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.3462912Z PASSED [6.4230s] [ 38%] 2025-12-04T09:53:09.3463789Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda <- test/inductor/test_torchinductor.py PASSED [6.5759s] [ 38%] 2025-12-04T09:53:09.3465252Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda PASSED [6.2088s] [ 39%] 2025-12-04T09:53:09.3467178Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda <- test/inductor/test_torchinductor.py W1204 09:48:55.538000 1725 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.3468554Z PASSED [6.9521s] [ 40%] 2025-12-04T09:53:09.3469408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda <- test/inductor/test_torchinductor.py PASSED [12.8475s] [ 40%] 2025-12-04T09:53:09.3471222Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda <- test/inductor/test_torchinductor.py W1204 09:49:13.890000 1725 site-packages/torch/_inductor/utils.py:2565] DeviceCopy in input program 2025-12-04T09:53:09.3472464Z PASSED [6.4041s] [ 41%] 2025-12-04T09:53:09.3473478Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1789s] [ 42%] 2025-12-04T09:53:09.3475260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1410s] [ 42%] 2025-12-04T09:53:09.3476954Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1403s] [ 42%] 2025-12-04T09:53:09.3477828Z 2025-12-04T09:53:09.3477978Z ==================================== RERUNS ==================================== 2025-12-04T09:53:09.3478609Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3479197Z Traceback (most recent call last): 2025-12-04T09:53:09.3479853Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3480511Z return value(self) 2025-12-04T09:53:09.3481186Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3481957Z self.check_model(model, inps) 2025-12-04T09:53:09.3482695Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3483499Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3484113Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3484798Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3485486Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3486230Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3487103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3487973Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3488786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3489561Z raise e 2025-12-04T09:53:09.3490252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3491038Z return func( 2025-12-04T09:53:09.3491757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3492671Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3493511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3494229Z return compile_fx_aot( 2025-12-04T09:53:09.3494922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3495685Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3496408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3497133Z return compile_fx( 2025-12-04T09:53:09.3497782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3498536Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3499387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3500206Z return _compile_fx_main( 2025-12-04T09:53:09.3501094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3501963Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3502835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3503643Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3504439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3505214Z return compile_fx_forward( 2025-12-04T09:53:09.3505952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3506711Z return inner_compile( 2025-12-04T09:53:09.3507191Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3507730Z return func(*args, **kwds) 2025-12-04T09:53:09.3508438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3509359Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3510263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3511084Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3512030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3512883Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3513708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3514507Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3515330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3516411Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3517407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3518200Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3519011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3519821Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3520557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3521328Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3521857Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3522311Z 2025-12-04T09:53:09.3522529Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3523544Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3524329Z 2025-12-04T09:53:09.3524612Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3525254Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3525718Z unimplemented [] 2025-12-04T09:53:09.3526048Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3526443Z inductor [] 2025-12-04T09:53:09.3526676Z graph_break [] 2025-12-04T09:53:09.3527049Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3528235Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3529294Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3530264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3531236Z warnings.warn( 2025-12-04T09:53:09.3531736Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3532323Z Traceback (most recent call last): 2025-12-04T09:53:09.3532969Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3533626Z return value(self) 2025-12-04T09:53:09.3534300Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3535069Z self.check_model(model, inps) 2025-12-04T09:53:09.3535737Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3536438Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3537044Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3537723Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3538407Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3539148Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3540099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3540905Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3541714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3542484Z raise e 2025-12-04T09:53:09.3543175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3543956Z return func( 2025-12-04T09:53:09.3544741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3545657Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3546495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3547211Z return compile_fx_aot( 2025-12-04T09:53:09.3547904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3548674Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3549399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3550123Z return compile_fx( 2025-12-04T09:53:09.3550776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3551530Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3552376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3553201Z return _compile_fx_main( 2025-12-04T09:53:09.3553920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3554776Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3555642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3556449Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3557248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3558018Z return compile_fx_forward( 2025-12-04T09:53:09.3558756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3559523Z return inner_compile( 2025-12-04T09:53:09.3560003Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3560545Z return func(*args, **kwds) 2025-12-04T09:53:09.3561249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3562228Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3563146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3563968Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3564777Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3565632Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3566481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3567281Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3568085Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3569189Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3570192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3570976Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3571785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3572610Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3573410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3574174Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3574701Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3575088Z 2025-12-04T09:53:09.3575319Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3576331Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3577121Z 2025-12-04T09:53:09.3577392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3578029Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3578508Z unimplemented [] 2025-12-04T09:53:09.3578826Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3579221Z inductor [] 2025-12-04T09:53:09.3579471Z graph_break [] 2025-12-04T09:53:09.3579833Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3581020Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3582094Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3583063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3584032Z warnings.warn( 2025-12-04T09:53:09.3584418Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3584894Z unimplemented [] 2025-12-04T09:53:09.3585219Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3585601Z inductor [] 2025-12-04T09:53:09.3585844Z graph_break [] 2025-12-04T09:53:09.3586222Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3587392Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3588467Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3589427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3590405Z warnings.warn( 2025-12-04T09:53:09.3590708Z =================================== FAILURES =================================== 2025-12-04T09:53:09.3591337Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3591937Z Traceback (most recent call last): 2025-12-04T09:53:09.3592564Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3593218Z return value(self) 2025-12-04T09:53:09.3593907Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3594675Z self.check_model(model, inps) 2025-12-04T09:53:09.3595331Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3596042Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3596743Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3597411Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3598101Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3598863Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3599737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3600595Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3601886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3602851Z raise e 2025-12-04T09:53:09.3603525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3604311Z return func( 2025-12-04T09:53:09.3605038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3605972Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3606799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3607520Z return compile_fx_aot( 2025-12-04T09:53:09.3608225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3608980Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3609705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3610426Z return compile_fx( 2025-12-04T09:53:09.3611079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3611827Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3612674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3613502Z return _compile_fx_main( 2025-12-04T09:53:09.3614220Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3615054Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3615919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3616738Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3617520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3618292Z return compile_fx_forward( 2025-12-04T09:53:09.3619036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3619811Z return inner_compile( 2025-12-04T09:53:09.3620283Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3620826Z return func(*args, **kwds) 2025-12-04T09:53:09.3621546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3622443Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3623359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3624180Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3624997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3625993Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3626857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3627665Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3628491Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3629486Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3630576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3631369Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3632206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3633058Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3633788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3634564Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3635098Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3635488Z 2025-12-04T09:53:09.3635708Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3636719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3637527Z 2025-12-04T09:53:09.3637796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3638442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3638906Z unimplemented [] 2025-12-04T09:53:09.3639245Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3639646Z inductor [] 2025-12-04T09:53:09.3639885Z graph_break [] 2025-12-04T09:53:09.3640265Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3641449Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3642592Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3643540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3644522Z warnings.warn( 2025-12-04T09:53:09.3644912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3645370Z unimplemented [] 2025-12-04T09:53:09.3645700Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3646096Z inductor [] 2025-12-04T09:53:09.3646350Z graph_break [] 2025-12-04T09:53:09.3646710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3647886Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3648955Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3649903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3650880Z warnings.warn( 2025-12-04T09:53:09.3651264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3651734Z unimplemented [] 2025-12-04T09:53:09.3652049Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3652443Z inductor [] 2025-12-04T09:53:09.3652691Z graph_break [] 2025-12-04T09:53:09.3653053Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3654311Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3655384Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3656338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3657296Z warnings.warn( 2025-12-04T09:53:09.3658297Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml - 2025-12-04T09:53:09.3659362Z =========================== short test summary info ============================ 2025-12-04T09:53:09.3660542Z FAILED [0.1403s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3661518Z 2025-12-04T09:53:09.3661737Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3662743Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3663542Z 2025-12-04T09:53:09.3663809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3664401Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:53:09.3664950Z ======== 1 failed, 41 passed, 22 skipped, 2 rerun in 255.81s (0:04:15) ========= 2025-12-04T09:53:09.3665430Z Got exit code 1 2025-12-04T09:53:09.3665708Z Retrying single test... 2025-12-04T09:53:09.3666327Z W1204 09:49:32.739000 6810 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.3667492Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml 2025-12-04T09:53:09.3668374Z ============================= test session starts ============================== 2025-12-04T09:53:09.3669044Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:53:09.3669644Z cachedir: .pytest_cache 2025-12-04T09:53:09.3670362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:53:09.3671149Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:53:09.3671493Z configfile: pytest.ini 2025-12-04T09:53:09.3672229Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:53:09.3673142Z collecting ... collected 934 items / 151 deselected / 783 selected 2025-12-04T09:53:09.3674249Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3675224Z Running 1 items in this shard 2025-12-04T09:53:09.3675450Z 2025-12-04T09:53:09.3676280Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.9725s] [100%] 2025-12-04T09:53:09.3678079Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1906s] [100%] 2025-12-04T09:53:09.3679776Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1403s] [100%] 2025-12-04T09:53:09.3680642Z 2025-12-04T09:53:09.3680798Z ==================================== RERUNS ==================================== 2025-12-04T09:53:09.3681412Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3682077Z Traceback (most recent call last): 2025-12-04T09:53:09.3682825Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3683471Z return value(self) 2025-12-04T09:53:09.3684161Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3684932Z self.check_model(model, inps) 2025-12-04T09:53:09.3685795Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3686638Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3687259Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3687947Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3688620Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3689382Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3690263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3691075Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3691874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3692655Z raise e 2025-12-04T09:53:09.3693347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3694136Z return func( 2025-12-04T09:53:09.3694844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3695773Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3696617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3697324Z return compile_fx_aot( 2025-12-04T09:53:09.3698030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3698803Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3699530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3700245Z return compile_fx( 2025-12-04T09:53:09.3701072Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3701833Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3702677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3703525Z return _compile_fx_main( 2025-12-04T09:53:09.3704265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3705127Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3705988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3706807Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3707605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3708385Z return compile_fx_forward( 2025-12-04T09:53:09.3709250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3710038Z return inner_compile( 2025-12-04T09:53:09.3710523Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3711052Z return func(*args, **kwds) 2025-12-04T09:53:09.3711931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3712853Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3713773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3714575Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3715399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3716336Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3717165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3717976Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3718808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3719813Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3720799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3721590Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3722448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3723279Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3723996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3724770Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3725297Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3725685Z 2025-12-04T09:53:09.3725904Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3726918Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3727724Z 2025-12-04T09:53:09.3727997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3728642Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3729105Z unimplemented [] 2025-12-04T09:53:09.3729438Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3729837Z inductor [] 2025-12-04T09:53:09.3730065Z graph_break [] 2025-12-04T09:53:09.3730443Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3731649Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3732731Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3733683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3734661Z warnings.warn( 2025-12-04T09:53:09.3735162Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3735763Z Traceback (most recent call last): 2025-12-04T09:53:09.3736397Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3737049Z return value(self) 2025-12-04T09:53:09.3737737Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3738496Z self.check_model(model, inps) 2025-12-04T09:53:09.3739165Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3739970Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3740598Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3741270Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3741960Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3742723Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3743581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3744446Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3745256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3746042Z raise e 2025-12-04T09:53:09.3746722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3747517Z return func( 2025-12-04T09:53:09.3748237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3749163Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3750004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3750723Z return compile_fx_aot( 2025-12-04T09:53:09.3751427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3752177Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3752903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3753628Z return compile_fx( 2025-12-04T09:53:09.3754277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3755034Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3755883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3756719Z return _compile_fx_main( 2025-12-04T09:53:09.3757433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3758285Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3759158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3759974Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3760757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3761528Z return compile_fx_forward( 2025-12-04T09:53:09.3762331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3763097Z return inner_compile( 2025-12-04T09:53:09.3763590Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3764134Z return func(*args, **kwds) 2025-12-04T09:53:09.3764864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3765772Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3766681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3767499Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3768385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3769236Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3770075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3770877Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3771691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3772781Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3773779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3774579Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3775390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3776224Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3776958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3777734Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3778250Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3778650Z 2025-12-04T09:53:09.3778869Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3779887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3780678Z 2025-12-04T09:53:09.3780946Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3781591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3782072Z unimplemented [] 2025-12-04T09:53:09.3782415Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3782800Z inductor [] 2025-12-04T09:53:09.3783047Z graph_break [] 2025-12-04T09:53:09.3783429Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3784606Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3785682Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3786657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3787630Z warnings.warn( 2025-12-04T09:53:09.3787999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3788472Z unimplemented [] 2025-12-04T09:53:09.3788799Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3789184Z inductor [] 2025-12-04T09:53:09.3789433Z graph_break [] 2025-12-04T09:53:09.3789801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3790974Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3792025Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3792983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3793960Z warnings.warn( 2025-12-04T09:53:09.3794265Z =================================== FAILURES =================================== 2025-12-04T09:53:09.3794898Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3795495Z Traceback (most recent call last): 2025-12-04T09:53:09.3796214Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3796855Z return value(self) 2025-12-04T09:53:09.3797545Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3798318Z self.check_model(model, inps) 2025-12-04T09:53:09.3798979Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3799677Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3800360Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3801206Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3801881Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3802694Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3803566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3804358Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3805173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3805961Z raise e 2025-12-04T09:53:09.3806654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3807421Z return func( 2025-12-04T09:53:09.3808149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3809080Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3809906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3810629Z return compile_fx_aot( 2025-12-04T09:53:09.3811341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3812111Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3812823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3813552Z return compile_fx( 2025-12-04T09:53:09.3814211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3814965Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3815797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3816631Z return _compile_fx_main( 2025-12-04T09:53:09.3817347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3818190Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3819054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3819874Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3820673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3821428Z return compile_fx_forward( 2025-12-04T09:53:09.3822173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3822945Z return inner_compile( 2025-12-04T09:53:09.3823417Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3823955Z return func(*args, **kwds) 2025-12-04T09:53:09.3824799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3825716Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3826610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3827426Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3828241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3829166Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3829989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3830789Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3831612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3832610Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3833612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3834412Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3835216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3836021Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3836758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3837536Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3838071Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3838464Z 2025-12-04T09:53:09.3838681Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3839690Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3840481Z 2025-12-04T09:53:09.3840758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3841397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3841854Z unimplemented [] 2025-12-04T09:53:09.3842241Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3842653Z inductor [] 2025-12-04T09:53:09.3842888Z graph_break [] 2025-12-04T09:53:09.3843267Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3844463Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3845534Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3846510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3847496Z warnings.warn( 2025-12-04T09:53:09.3847888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3848348Z unimplemented [] 2025-12-04T09:53:09.3848677Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3849072Z inductor [] 2025-12-04T09:53:09.3849303Z graph_break [] 2025-12-04T09:53:09.3849682Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3850872Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3851945Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3852968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3853947Z warnings.warn( 2025-12-04T09:53:09.3854340Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3854799Z unimplemented [] 2025-12-04T09:53:09.3855128Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3855524Z inductor [] 2025-12-04T09:53:09.3855774Z graph_break [] 2025-12-04T09:53:09.3856138Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3857383Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3858449Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3859395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3860367Z warnings.warn( 2025-12-04T09:53:09.3861290Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml - 2025-12-04T09:53:09.3862353Z =========================== short test summary info ============================ 2025-12-04T09:53:09.3863520Z FAILED [0.1403s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3864507Z 2025-12-04T09:53:09.3864727Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3865737Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3866528Z 2025-12-04T09:53:09.3866815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3867407Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:53:09.3867937Z ================== 1 failed, 151 deselected, 2 rerun in 1.39s ================== 2025-12-04T09:53:09.3868391Z Got exit code 1 2025-12-04T09:53:09.3868665Z Retrying single test... 2025-12-04T09:53:09.3869290Z W1204 09:49:52.908000 6979 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.3870451Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml 2025-12-04T09:53:09.3871338Z ============================= test session starts ============================== 2025-12-04T09:53:09.3871989Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:53:09.3872599Z cachedir: .pytest_cache 2025-12-04T09:53:09.3873321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:53:09.3874111Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:53:09.3874454Z configfile: pytest.ini 2025-12-04T09:53:09.3875183Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:53:09.3876097Z collecting ... collected 934 items / 151 deselected / 783 selected 2025-12-04T09:53:09.3877208Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3878180Z Running 1 items in this shard 2025-12-04T09:53:09.3878407Z 2025-12-04T09:53:09.3879249Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.9643s] [100%] 2025-12-04T09:53:09.3881141Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1832s] [100%] 2025-12-04T09:53:09.3882912Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1363s] [100%] 2025-12-04T09:53:09.3883783Z 2025-12-04T09:53:09.3883931Z ==================================== RERUNS ==================================== 2025-12-04T09:53:09.3884573Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3885239Z Traceback (most recent call last): 2025-12-04T09:53:09.3885889Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3886530Z return value(self) 2025-12-04T09:53:09.3887216Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3887992Z self.check_model(model, inps) 2025-12-04T09:53:09.3888658Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3889359Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3889980Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3890655Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3891323Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3892081Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3892960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3893744Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3894554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3895338Z raise e 2025-12-04T09:53:09.3896025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3896796Z return func( 2025-12-04T09:53:09.3897509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3898435Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3899256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3899979Z return compile_fx_aot( 2025-12-04T09:53:09.3900682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3901605Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3902316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3903044Z return compile_fx( 2025-12-04T09:53:09.3903705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3904458Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3905298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3906134Z return _compile_fx_main( 2025-12-04T09:53:09.3906857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3907710Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3908572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3909389Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3910299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3911052Z return compile_fx_forward( 2025-12-04T09:53:09.3911793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3912569Z return inner_compile( 2025-12-04T09:53:09.3913037Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3913579Z return func(*args, **kwds) 2025-12-04T09:53:09.3914381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3915294Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3916205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3917039Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3917862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3918712Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3919540Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3920346Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3921174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3922234Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3923244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3924049Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3924865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3925672Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3926412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3927195Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3927723Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3928114Z 2025-12-04T09:53:09.3928331Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3929349Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3930133Z 2025-12-04T09:53:09.3930414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3931054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3931515Z unimplemented [] 2025-12-04T09:53:09.3931850Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3932246Z inductor [] 2025-12-04T09:53:09.3932474Z graph_break [] 2025-12-04T09:53:09.3932845Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3934031Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.3935097Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.3936059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.3937034Z warnings.warn( 2025-12-04T09:53:09.3937530Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.3938205Z Traceback (most recent call last): 2025-12-04T09:53:09.3938856Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.3939508Z return value(self) 2025-12-04T09:53:09.3940181Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.3940954Z self.check_model(model, inps) 2025-12-04T09:53:09.3941624Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.3942387Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.3942992Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.3943673Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.3944356Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.3945098Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.3945968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.3946770Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.3947577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3948343Z raise e 2025-12-04T09:53:09.3949024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.3949810Z return func( 2025-12-04T09:53:09.3950525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.3951448Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.3952291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.3953012Z return compile_fx_aot( 2025-12-04T09:53:09.3953704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.3954475Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.3955199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.3955922Z return compile_fx( 2025-12-04T09:53:09.3956568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.3957322Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.3958162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.3958981Z return _compile_fx_main( 2025-12-04T09:53:09.3959844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.3960698Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.3961559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.3962430Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3963227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.3964004Z return compile_fx_forward( 2025-12-04T09:53:09.3964755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.3965514Z return inner_compile( 2025-12-04T09:53:09.3966002Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.3966544Z return func(*args, **kwds) 2025-12-04T09:53:09.3967340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.3968257Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.3969167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.3969982Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.3970789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.3971699Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.3972534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.3973320Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.3974142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.3975144Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.3976138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.3976915Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.3977720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.3978545Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.3979280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.3980035Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.3980563Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.3980955Z 2025-12-04T09:53:09.3981188Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.3982190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.3982978Z 2025-12-04T09:53:09.3983246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.3996507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.3997127Z unimplemented [] 2025-12-04T09:53:09.3997484Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.3997887Z inductor [] 2025-12-04T09:53:09.3998138Z graph_break [] 2025-12-04T09:53:09.3998509Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.3999713Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.4000805Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.4001952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.4002991Z warnings.warn( 2025-12-04T09:53:09.4003388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.4003858Z unimplemented [] 2025-12-04T09:53:09.4004172Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.4004582Z inductor [] 2025-12-04T09:53:09.4004823Z graph_break [] 2025-12-04T09:53:09.4005185Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.4006364Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.4007432Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.4008609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.4009580Z warnings.warn( 2025-12-04T09:53:09.4009893Z =================================== FAILURES =================================== 2025-12-04T09:53:09.4010509Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______ 2025-12-04T09:53:09.4011095Z Traceback (most recent call last): 2025-12-04T09:53:09.4011835Z File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test 2025-12-04T09:53:09.4012478Z return value(self) 2025-12-04T09:53:09.4013172Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion 2025-12-04T09:53:09.4013934Z self.check_model(model, inps) 2025-12-04T09:53:09.4014598Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model 2025-12-04T09:53:09.4015291Z actual = AOTIRunnerUtil.run( 2025-12-04T09:53:09.4015896Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run 2025-12-04T09:53:09.4016559Z package_path = AOTIRunnerUtil.compile( 2025-12-04T09:53:09.4017228Z File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile 2025-12-04T09:53:09.4017974Z package_path = torch._inductor.aoti_compile_and_package( 2025-12-04T09:53:09.4018817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package 2025-12-04T09:53:09.4019608Z return aot_inductor_minifier_wrapper( 2025-12-04T09:53:09.4020405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.4021160Z raise e 2025-12-04T09:53:09.4021841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper 2025-12-04T09:53:09.4022610Z return func( 2025-12-04T09:53:09.4023325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner 2025-12-04T09:53:09.4024241Z aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs) 2025-12-04T09:53:09.4025079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile 2025-12-04T09:53:09.4025793Z return compile_fx_aot( 2025-12-04T09:53:09.4026495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot 2025-12-04T09:53:09.4027243Z compiled_artifacts = compile_fx( 2025-12-04T09:53:09.4027962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx 2025-12-04T09:53:09.4028686Z return compile_fx( 2025-12-04T09:53:09.4029338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx 2025-12-04T09:53:09.4030083Z return _maybe_wrap_and_compile_fx_main( 2025-12-04T09:53:09.4030928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main 2025-12-04T09:53:09.4031758Z return _compile_fx_main( 2025-12-04T09:53:09.4032465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main 2025-12-04T09:53:09.4033343Z return inference_compiler(unlifted_gm, example_inputs_) 2025-12-04T09:53:09.4034217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__ 2025-12-04T09:53:09.4035025Z return self.compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.4035820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base 2025-12-04T09:53:09.4036659Z return compile_fx_forward( 2025-12-04T09:53:09.4037406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward 2025-12-04T09:53:09.4038173Z return inner_compile( 2025-12-04T09:53:09.4038657Z File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner 2025-12-04T09:53:09.4039199Z return func(*args, **kwds) 2025-12-04T09:53:09.4039906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner 2025-12-04T09:53:09.4040881Z return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( 2025-12-04T09:53:09.4041790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper 2025-12-04T09:53:09.4042683Z inner_compiled_fn = compiler_fn(gm, example_inputs) 2025-12-04T09:53:09.4043498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner 2025-12-04T09:53:09.4044339Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T09:53:09.4045175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner 2025-12-04T09:53:09.4045981Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T09:53:09.4046790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T09:53:09.4047799Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T09:53:09.4048799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile 2025-12-04T09:53:09.4049599Z _check_triton_bf16_support(graph) 2025-12-04T09:53:09.4050398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support 2025-12-04T09:53:09.4051215Z warn_and_skip(node.get_device()) 2025-12-04T09:53:09.4051946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip 2025-12-04T09:53:09.4052702Z raise SkipFrame("BF16 is not supported") 2025-12-04T09:53:09.4053223Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.4053623Z 2025-12-04T09:53:09.4053840Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.4054858Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.4055646Z 2025-12-04T09:53:09.4055914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.4056550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.4057020Z unimplemented [] 2025-12-04T09:53:09.4057334Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.4057728Z inductor [] 2025-12-04T09:53:09.4057971Z graph_break [] 2025-12-04T09:53:09.4058349Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.4059527Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.4060603Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.4061568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.4062542Z warnings.warn( 2025-12-04T09:53:09.4062918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.4063392Z unimplemented [] 2025-12-04T09:53:09.4063716Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.4064180Z inductor [] 2025-12-04T09:53:09.4064424Z graph_break [] 2025-12-04T09:53:09.4064799Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.4065963Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.4067031Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.4067990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.4069028Z warnings.warn( 2025-12-04T09:53:09.4069400Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:53:09.4069866Z unimplemented [] 2025-12-04T09:53:09.4070194Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T09:53:09.4070572Z inductor [] 2025-12-04T09:53:09.4070812Z graph_break [] 2025-12-04T09:53:09.4071192Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T09:53:09.4072372Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead. 2025-12-04T09:53:09.4073429Z return cls.__new__(cls, *args) 2025-12-04T09:53:09.4074379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T09:53:09.4075349Z warnings.warn( 2025-12-04T09:53:09.4076259Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml - 2025-12-04T09:53:09.4077326Z =========================== short test summary info ============================ 2025-12-04T09:53:09.4078511Z FAILED [0.1363s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported 2025-12-04T09:53:09.4079487Z 2025-12-04T09:53:09.4079720Z To execute this test, run the following from the base repo dir: 2025-12-04T09:53:09.4080716Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.4081515Z 2025-12-04T09:53:09.4081781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:53:09.4082451Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:53:09.4082988Z ================== 1 failed, 151 deselected, 2 rerun in 1.37s ================== 2025-12-04T09:53:09.4083429Z Got exit code 1 2025-12-04T09:53:09.4084173Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda 2025-12-04T09:53:09.4085311Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:53:09.4086308Z W1204 09:50:12.514000 7148 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.4087444Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml 2025-12-04T09:53:09.4088318Z ============================= test session starts ============================== 2025-12-04T09:53:09.4088985Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:53:09.4089593Z cachedir: .pytest_cache 2025-12-04T09:53:09.4090291Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:53:09.4091079Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:53:09.4091434Z configfile: pytest.ini 2025-12-04T09:53:09.4092230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:53:09.4093142Z collecting ... collected 934 items / 64 deselected / 870 selected 2025-12-04T09:53:09.4093654Z stepcurrent: skipping 64 already run items. 2025-12-04T09:53:09.4094050Z Running 88 items in this shard 2025-12-04T09:53:09.4094264Z 2025-12-04T09:53:09.4094936Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda <- test/inductor/test_torchinductor.py PASSED [9.3493s] [ 1%] 2025-12-04T09:53:09.4097014Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_on_disk_cuda <- test/inductor/test_torchinductor.py W1204 09:50:25.748000 7148 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T09:53:09.4098461Z PASSED [15.5315s] [ 2%] 2025-12-04T09:53:09.4099562Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda SKIPPED [0.0004s] (install_free_tensors leads to OOM - https://github.com/pytorch/pytorch/issues/164062) [ 3%] 2025-12-04T09:53:09.4101439Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda SKIPPED [0.0002s] (Skip this test, only for local test. SIGABRT is produced.) [ 4%] 2025-12-04T09:53:09.4102926Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda <- test/inductor/test_torchinductor.py PASSED [5.9330s] [ 5%] 2025-12-04T09:53:09.4104408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda <- test/inductor/test_torchinductor.py PASSED [5.4572s] [ 6%] 2025-12-04T09:53:09.4105748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda XFAIL [0.0330s] [ 7%] 2025-12-04T09:53:09.4107080Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda PASSED [11.5715s] [ 9%] 2025-12-04T09:53:09.4108628Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda <- test/inductor/test_torchinductor.py PASSED [5.1252s] [ 10%] 2025-12-04T09:53:09.4110159Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [7.8888s] [ 11%] 2025-12-04T09:53:09.4111670Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda <- test/inductor/test_torchinductor.py PASSED [6.1924s] [ 12%] 2025-12-04T09:53:09.4113195Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda SKIPPED [0.0003s] (scaled_grouped_mm is only supported on SM90) [ 13%] 2025-12-04T09:53:09.4114684Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda SKIPPED [0.0002s] (bfloat16 only supported in sm80+ or XPU) [ 14%] 2025-12-04T09:53:09.4116142Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [6.0785s] [ 15%] 2025-12-04T09:53:09.4117761Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda SKIPPED [0.0003s] (Test is only supported on CUDA 12.8+) [ 17%] 2025-12-04T09:53:09.4120049Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda <- test/inductor/test_torchinductor.py W1204 09:51:27.909000 7148 site-packages/torch/_inductor/ir.py:8050] [0/0] aten._unique2.default is missing a c-shim implementation, using proxy executor as fallback 2025-12-04T09:53:09.4121645Z PASSED [6.1285s] [ 18%] 2025-12-04T09:53:09.4122557Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda <- test/inductor/test_torchinductor.py PASSED [5.2771s] [ 19%] 2025-12-04T09:53:09.4124012Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda <- test/inductor/test_torchinductor.py PASSED [6.5355s] [ 20%] 2025-12-04T09:53:09.4125489Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda PASSED [6.3012s] [ 21%] 2025-12-04T09:53:09.4126782Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda PASSED [5.9789s] [ 22%] 2025-12-04T09:53:09.4128202Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda PASSED [9.1404s] [ 23%] 2025-12-04T09:53:09.4129772Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda PASSED [5.8727s] [ 25%] 2025-12-04T09:53:09.4131685Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda SKIPPED [0.0031s] (requires triton.tools.experimental_descriptor TMA support) [ 26%] 2025-12-04T09:53:09.4133761Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda SKIPPED [0.0029s] (requires triton.tools.tensor_descriptor TMA support) [ 27%] 2025-12-04T09:53:09.4135816Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda SKIPPED [0.0030s] (requires triton.tools.tensor_descriptor TMA support) [ 28%] 2025-12-04T09:53:09.4137903Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda SKIPPED [0.0027s] (requires triton.tools.experimental_descriptor TMA support) [ 29%] 2025-12-04T09:53:09.4139787Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda PASSED [5.9304s] [ 30%] 2025-12-04T09:53:09.4141736Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:71] +============================+ 2025-12-04T09:53:09.4143301Z W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:72] | !!! WARNING !!! | 2025-12-04T09:53:09.4144148Z W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:73] +============================+ 2025-12-04T09:53:09.4145861Z W1204 09:52:24.605000 7148 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead. 2025-12-04T09:53:09.4147355Z Error: Expected u1 >= 1 but received 0 2025-12-04T09:53:09.4147715Z PASSED [11.3883s] [ 31%] 2025-12-04T09:53:09.4148608Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda PASSED [8.0390s] [ 32%] 2025-12-04T09:53:09.4150658Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda W1204 09:52:39.830000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.4152111Z PASSED [8.1713s] [ 34%] 2025-12-04T09:53:09.4153287Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda W1204 09:52:46.937000 7148 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T09:53:09.4154517Z PASSED [6.4180s] [ 35%] 2025-12-04T09:53:09.4155361Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda PASSED [6.8472s] [ 36%] 2025-12-04T09:53:09.4157549Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda W1204 09:52:59.866000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.4159655Z W1204 09:52:59.866000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead 2025-12-04T09:53:09.4160559Z PASSED [7.0304s] [ 37%] 2025-12-04T09:53:09.4161504Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps SKIPPED [0.0004s] (No MPS backend available) [ 38%] 2025-12-04T09:53:09.4163333Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps SKIPPED [0.0002s] (No MPS backend available) [ 39%] 2025-12-04T09:53:09.4165262Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 40%] 2025-12-04T09:53:09.4167043Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 42%] 2025-12-04T09:53:09.4168906Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (No MPS backend available) [ 43%] 2025-12-04T09:53:09.4170582Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps SKIPPED [0.0003s] (No MPS backend available) [ 44%] 2025-12-04T09:53:09.4172177Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 45%] 2025-12-04T09:53:09.4173977Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 46%] 2025-12-04T09:53:09.4175707Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps SKIPPED [0.0003s] (No MPS backend available) [ 47%] 2025-12-04T09:53:09.4177497Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 48%] 2025-12-04T09:53:09.4179145Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps SKIPPED [0.0002s] (No MPS backend available) [ 50%] 2025-12-04T09:53:09.4180718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 51%] 2025-12-04T09:53:09.4182511Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 52%] 2025-12-04T09:53:09.4184265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 53%] 2025-12-04T09:53:09.4185989Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%] 2025-12-04T09:53:09.4187793Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 55%] 2025-12-04T09:53:09.4189524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps SKIPPED [0.0002s] (No MPS backend available) [ 56%] 2025-12-04T09:53:09.4191195Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 57%] 2025-12-04T09:53:09.4193044Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%] 2025-12-04T09:53:09.4194577Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps SKIPPED [0.0002s] (No MPS backend available) [ 60%] 2025-12-04T09:53:09.4195875Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps SKIPPED [0.0002s] (No MPS backend available) [ 61%] 2025-12-04T09:53:09.4197211Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps SKIPPED [0.0002s] (No MPS backend available) [ 62%] 2025-12-04T09:53:09.4198564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps SKIPPED [0.0002s] (No MPS backend available) [ 63%] 2025-12-04T09:53:09.4200077Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 64%] 2025-12-04T09:53:09.4201991Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 65%] 2025-12-04T09:53:09.4203636Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps SKIPPED [0.0002s] (No MPS backend available) [ 67%] 2025-12-04T09:53:09.4205286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 68%] 2025-12-04T09:53:09.4207096Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 69%] 2025-12-04T09:53:09.4208755Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%] 2025-12-04T09:53:09.4210293Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps SKIPPED [0.0002s] (No MPS backend available) [ 71%] 2025-12-04T09:53:09.4212040Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 72%] 2025-12-04T09:53:09.4213833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 73%] 2025-12-04T09:53:09.4215593Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%] 2025-12-04T09:53:09.4217286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 76%] 2025-12-04T09:53:09.4218963Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps SKIPPED [0.0002s] (No MPS backend available) [ 77%] 2025-12-04T09:53:09.4220661Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 78%] 2025-12-04T09:53:09.4222265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps SKIPPED [0.0002s] (No MPS backend available) [ 79%] 2025-12-04T09:53:09.4223894Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 80%] 2025-12-04T09:53:09.4225718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 81%] 2025-12-04T09:53:09.4227671Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps SKIPPED [0.0004s] (No MPS backend available) [ 82%] 2025-12-04T09:53:09.4229524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 84%] 2025-12-04T09:53:09.4231347Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%] 2025-12-04T09:53:09.4233321Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 86%] 2025-12-04T09:53:09.4235237Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 87%] 2025-12-04T09:53:09.4237039Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 88%] 2025-12-04T09:53:09.4238819Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 89%] 2025-12-04T09:53:09.4240672Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 90%] 2025-12-04T09:53:09.4242641Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 92%] 2025-12-04T09:53:09.4244462Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 93%] 2025-12-04T09:53:09.4246214Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 94%] 2025-12-04T09:53:09.4247870Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 95%] 2025-12-04T09:53:09.4249592Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 96%] 2025-12-04T09:53:09.4251403Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 97%] 2025-12-04T09:53:09.4253220Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 98%] 2025-12-04T09:53:09.4255097Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%] 2025-12-04T09:53:09.4256137Z 2025-12-04T09:53:09.4256903Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml - 2025-12-04T09:53:09.4258055Z ===== 23 passed, 64 skipped, 64 deselected, 1 xfailed in 172.47s (0:02:52) ===== 2025-12-04T09:53:09.4259143Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda'] 2025-12-04T09:53:09.4259957Z 2025-12-04T09:53:09.4260526Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 4/6 (test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log) 2025-12-04T09:53:09.4261292Z 2025-12-04T09:53:09.4261647Z Finished inductor/test_aot_inductor 4/6 ... [2025-12-04 09:53:09.315070][2346.924975125], took 8.32min 2025-12-04T09:53:09.4262944Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml 2025-12-04T09:53:09.6325625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml 2025-12-04T09:53:09.6598249Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml 2025-12-04T09:53:09.7067049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml 2025-12-04T09:53:09.9007293Z Uploading logs for 57119749427 to S3 2025-12-04T09:53:09.9281172Z Uploading artifacts took 0.18 seconds 2025-12-04T09:53:09.9281648Z inductor/test_aot_inductor 4/6 failed! 2025-12-04T09:53:09.9286990Z Running inductor/test_torchinductor_dynamic_shapes 1/5 ... [2025-12-04 09:53:09.928522][2347.538431633] 2025-12-04T09:53:09.9287724Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:53:09.9292026Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=1', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:53:09.928958] 2025-12-04T10:02:00.0315451Z 2025-12-04T10:02:00.0319392Z inductor/test_torchinductor_dynamic_shapes 1/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.5_8dad9aa6fdc82df0_.log 2025-12-04T10:02:00.0534587Z Running 350 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex9_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aliased_buffer_reuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_dtype_device_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_str_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin_with_nan_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_min_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_as_strided_on_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_chunk_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_compar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_with_scalar_src_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cummin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dont_constant_fold_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_bag_byte_unpack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_index_expression_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_no_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_pad_dynamic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_refcount_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_indirect_load_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inner_fn_str_and_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_insignificant_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_kernel_names_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_l1_loss_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_strided_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linalg_eig_stride_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_dynamic_maxautotune_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_dynamic_shape_assertion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_2_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_move_arange_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_one_hot_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_expit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_hermite_polynomial_h_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_roll_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_round_correctness_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_reduce3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sizehint_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_view_with_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sort_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsigned_constant_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_div_by_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_real_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_detach_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_bwd_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_addmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_support_out_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_batch_norm_2d_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bernoulli1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bfloat16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bmm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_use_after_remove_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_negative_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_from_real_imag_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_memory_overlap_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_with_scalar_src_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_tensor_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dont_constant_fold_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtype_sympy_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_sparse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_truncation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_getitem_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_scalar_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isin_tensor_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_broadcast_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_strided_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_mode_not_decompose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_triton_kernel_wrapper_functional_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logcumsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_long_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_2_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_min_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_cast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pattern_matcher_multi_user_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_multigammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_zeta_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rand_like_deterministic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction_config_limit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_replication_pad_errors_with_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_signbit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_silu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_failed_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_dynamic_shape_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_integer_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_stack_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_uint4x2_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unbind_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_bilinear2d_b_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_where_with_logical_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_xblock_divides_xnumel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arange_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arithmetic_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_cat_unbacked_duplicate_size_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_full_symbolic_value_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_unbacked_stride_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op0_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op3_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_no_realloc_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_fallback_specialization_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unwrap_storage_didnt_work_repro_cuda 2025-12-04T10:02:00.0746570Z 2025-12-04T10:02:00.0747062Z Finished inductor/test_torchinductor_dynamic_shapes 1/5 ... [2025-12-04 10:02:00.032256][2877.642162343], took 8.84min 2025-12-04T10:02:00.0748660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.xml 2025-12-04T10:02:00.1338993Z Running inductor/test_torchinductor_dynamic_shapes 5/5 ... [2025-12-04 10:02:00.133578][2877.743485122] 2025-12-04T10:02:00.1339676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:02:00.1342655Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:00.134016] 2025-12-04T10:11:03.9776599Z 2025-12-04T10:11:03.9777861Z inductor/test_torchinductor_dynamic_shapes 5/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.5_0c7fd80a5a340f9b_.log 2025-12-04T10:11:04.0001355Z Running 370 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_alexnet_prefix_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_with_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool_errors_with_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_baddbmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bmm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_backward_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_default_layout_constraint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dense_mask_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_diagonal_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_sparse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_repr_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generate_rand_fp8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_constant_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_mutation_real_name_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardsigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardswish_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_failed_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_multiple_specializations_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_issue102546_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_repeated_blocks_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_fp64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logaddexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logcumsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_min_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_neg_max_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nll_loss_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_view_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pattern_matcher_multi_user_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_airy_ai_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_y0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_digamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_logit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_prod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_with_dtype_and_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction_config_limit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_resize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_searchsorted_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tan_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor_index_put_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_to_memory_format_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_correction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_with_logical_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_const_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adding_tensor_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_addmv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_cache_hit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_on_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_add_autotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_computed_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_nd_tiling_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_single_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_concat_add_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_shape_check_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_cpp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dist_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_by_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_presicion_accuracy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fft_real_input_real_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fusing_write_into_disjoint_read_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gpu_scalar_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_grid_sampler_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_horizonal_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inner_reduction_detection_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_insignificant_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_block_sizes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mean_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_misaligned_address_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mix_device_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mixed_mm3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_mixed_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mul_index_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mul_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_threading_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_max_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_y1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_entr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_hermite_polynomial_h_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_psi_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_scaled_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlog1py_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_polar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_profiler_mark_wrapper_call_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reflection_pad2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_roll_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_setitem_with_int_parameter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_extremal_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_stable_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_with_int64_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_std_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_keepdims_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_bwd_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zeros_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_adaptive_max_pool3d_with_indices_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_is_integer_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_inf_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op4_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op5_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op7_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_size_factory_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_pad_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_slice_index_changing_sign_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sub_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_save_for_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_softshrink_cuda 2025-12-04T10:11:04.0219308Z 2025-12-04T10:11:04.0219769Z Finished inductor/test_torchinductor_dynamic_shapes 5/5 ... [2025-12-04 10:11:03.978540][3421.588446977], took 9.06min 2025-12-04T10:11:04.0221315Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.xml 2025-12-04T10:11:04.0743656Z Running inductor/test_kernel_benchmark 1/1 ... [2025-12-04 10:11:04.074090][3421.683997583] 2025-12-04T10:11:04.0744249Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:11:04.0747683Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_kernel_benchmark.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:11:04.074508] 2025-12-04T10:15:47.8002433Z 2025-12-04T10:15:47.8003407Z PRINTING LOG FILE of inductor/test_kernel_benchmark 1/1 (test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log) 2025-12-04T10:15:47.8004881Z W1204 10:11:12.599000 34099 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8006638Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml 2025-12-04T10:15:47.8007943Z ============================= test session starts ============================== 2025-12-04T10:15:47.8008638Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:15:47.8009291Z cachedir: .pytest_cache 2025-12-04T10:15:47.8009991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:15:47.8010758Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:15:47.8011105Z configfile: pytest.ini 2025-12-04T10:15:47.8011814Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:15:47.8012595Z collecting ... collected 18 items 2025-12-04T10:15:47.8012984Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:15:47.8021823Z Running 18 items in this shard: test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_fused_layernorm_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_triton_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation_2, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_pw_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_reduction_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_multiple_kernels, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_scalar, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_templates, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_cat_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_split_scan, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_star_dep, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_unused_input_bandwidth_computation 2025-12-04T10:15:47.8030911Z 2025-12-04T10:15:47.8031428Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_fused_layernorm_bandwidth_computation PASSED [20.4646s] [ 5%] 2025-12-04T10:15:47.8032998Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_bandwidth_computation W1204 10:11:34.107000 34099 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8034145Z PASSED [13.4783s] [ 11%] 2025-12-04T10:15:47.8035108Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_triton_kernel_benchmark SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 16%] 2025-12-04T10:15:47.8036488Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation PASSED [13.1103s] [ 22%] 2025-12-04T10:15:47.8037765Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation_2 PASSED [12.9525s] [ 27%] 2025-12-04T10:15:47.8040200Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/118346 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 33%] 2025-12-04T10:15:47.8042851Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_pw_kernel_benchmark PASSED [13.4593s] [ 38%] 2025-12-04T10:15:47.8043897Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_reduction_bandwidth_computation PASSED [13.3010s] [ 44%] 2025-12-04T10:15:47.8044932Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps PASSED [21.5735s] [ 50%] 2025-12-04T10:15:47.8046015Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_multiple_kernels PASSED [24.0462s] [ 55%] 2025-12-04T10:15:47.8047116Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_scalar PASSED [21.6602s] [ 61%] 2025-12-04T10:15:47.8048480Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_templates SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 66%] 2025-12-04T10:15:47.8049851Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_bandwidth_computation PASSED [13.2732s] [ 72%] 2025-12-04T10:15:47.8050961Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_cat_bandwidth_computation PASSED [13.0895s] [ 77%] 2025-12-04T10:15:47.8052148Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1448s] [ 83%] 2025-12-04T10:15:47.8053408Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1044s] [ 83%] 2025-12-04T10:15:47.8054562Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1043s] [ 83%] 2025-12-04T10:15:47.8055175Z 2025-12-04T10:15:47.8055315Z ==================================== RERUNS ==================================== 2025-12-04T10:15:47.8055880Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8056421Z Traceback (most recent call last): 2025-12-04T10:15:47.8057185Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8057975Z out = f(*inputs) 2025-12-04T10:15:47.8058624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8059481Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8060362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8061193Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8062014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8062797Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8063585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8064573Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8065543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8066297Z graph.run(*example_inputs) 2025-12-04T10:15:47.8066990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8067641Z return super().run(*args) 2025-12-04T10:15:47.8068246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8068887Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8069560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8070240Z result = super().run_node(n) 2025-12-04T10:15:47.8070865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8071653Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8072392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8073222Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8074037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8074834Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8075609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8076299Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8076959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8077661Z return autotune_select_algorithm( 2025-12-04T10:15:47.8078473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8079282Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8079987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8080872Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8082586Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8084059Z target: aten.mm.default 2025-12-04T10:15:47.8084356Z args[0]: TensorBox( 2025-12-04T10:15:47.8084635Z ReinterpretView( 2025-12-04T10:15:47.8084915Z StorageBox( 2025-12-04T10:15:47.8085457Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8086080Z ), 2025-12-04T10:15:47.8086499Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8087029Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8087366Z stack_traces = {, 2025-12-04T10:15:47.8087924Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8088535Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8088866Z , 2025-12-04T10:15:47.8089080Z } 2025-12-04T10:15:47.8089278Z ) 2025-12-04T10:15:47.8089489Z ) 2025-12-04T10:15:47.8089730Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8090331Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8090948Z )) 2025-12-04T10:15:47.8091082Z 2025-12-04T10:15:47.8091782Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8092604Z 2025-12-04T10:15:47.8092609Z 2025-12-04T10:15:47.8092832Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8093840Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8094546Z 2025-12-04T10:15:47.8094806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8095430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8095899Z frames [('total', 1)] 2025-12-04T10:15:47.8096175Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8096607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8097139Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8097502Z graph_break [] 2025-12-04T10:15:47.8097873Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8098515Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8099141Z Traceback (most recent call last): 2025-12-04T10:15:47.8099908Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8100677Z out = f(*inputs) 2025-12-04T10:15:47.8101542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8102401Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8103271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8104100Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8104931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8105700Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8106493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8107469Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8108434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8109183Z graph.run(*example_inputs) 2025-12-04T10:15:47.8109788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8110431Z return super().run(*args) 2025-12-04T10:15:47.8111014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8111665Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8112341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8113018Z result = super().run_node(n) 2025-12-04T10:15:47.8113648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8114376Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8115118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8115950Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8116756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8117555Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8118340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8119010Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8119682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8120381Z return autotune_select_algorithm( 2025-12-04T10:15:47.8121365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8122232Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8122948Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8123828Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8125476Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8127040Z target: aten.mm.default 2025-12-04T10:15:47.8127339Z args[0]: TensorBox( 2025-12-04T10:15:47.8127621Z ReinterpretView( 2025-12-04T10:15:47.8127881Z StorageBox( 2025-12-04T10:15:47.8128446Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8129066Z ), 2025-12-04T10:15:47.8129477Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8130031Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8130375Z stack_traces = {, 2025-12-04T10:15:47.8130932Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8142257Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8142740Z , 2025-12-04T10:15:47.8142951Z } 2025-12-04T10:15:47.8143173Z ) 2025-12-04T10:15:47.8143391Z ) 2025-12-04T10:15:47.8143623Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8144253Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8144878Z )) 2025-12-04T10:15:47.8144998Z 2025-12-04T10:15:47.8145711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8146552Z 2025-12-04T10:15:47.8146557Z 2025-12-04T10:15:47.8146769Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8147692Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8148401Z 2025-12-04T10:15:47.8148676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8149305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8149751Z frames [('total', 1)] 2025-12-04T10:15:47.8150040Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8150475Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8150936Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8151269Z graph_break [] 2025-12-04T10:15:47.8151550Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8152003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8152456Z frames [('total', 1)] 2025-12-04T10:15:47.8152742Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8153157Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8153627Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8153963Z graph_break [] 2025-12-04T10:15:47.8154231Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8154624Z =================================== FAILURES =================================== 2025-12-04T10:15:47.8155191Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8155731Z Traceback (most recent call last): 2025-12-04T10:15:47.8156495Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8157389Z out = f(*inputs) 2025-12-04T10:15:47.8158049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8158898Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8159782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8160611Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8161497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8162346Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8163147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8164130Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8165094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8165844Z graph.run(*example_inputs) 2025-12-04T10:15:47.8166450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8167089Z return super().run(*args) 2025-12-04T10:15:47.8167672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8168332Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8169005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8169685Z result = super().run_node(n) 2025-12-04T10:15:47.8170308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8171040Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8171782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8172609Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8173420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8174220Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8175002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8175676Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8176349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8177054Z return autotune_select_algorithm( 2025-12-04T10:15:47.8177867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8178669Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8179375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8180252Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8181885Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8183359Z target: aten.mm.default 2025-12-04T10:15:47.8183654Z args[0]: TensorBox( 2025-12-04T10:15:47.8183933Z ReinterpretView( 2025-12-04T10:15:47.8184189Z StorageBox( 2025-12-04T10:15:47.8184824Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8185454Z ), 2025-12-04T10:15:47.8185866Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8186420Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8186764Z stack_traces = {, 2025-12-04T10:15:47.8187318Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8187933Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8188344Z , 2025-12-04T10:15:47.8188566Z } 2025-12-04T10:15:47.8188767Z ) 2025-12-04T10:15:47.8188984Z ) 2025-12-04T10:15:47.8189225Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8189824Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8190444Z )) 2025-12-04T10:15:47.8190563Z 2025-12-04T10:15:47.8191277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8192102Z 2025-12-04T10:15:47.8192106Z 2025-12-04T10:15:47.8192330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8193235Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8193954Z 2025-12-04T10:15:47.8194216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8194843Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8195296Z frames [('total', 1)] 2025-12-04T10:15:47.8195568Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8195992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8196464Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8196782Z graph_break [] 2025-12-04T10:15:47.8197064Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8197527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8197967Z frames [('total', 1)] 2025-12-04T10:15:47.8198253Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8198680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8199156Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8199476Z graph_break [] 2025-12-04T10:15:47.8199749Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8200224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8200663Z frames [('total', 1)] 2025-12-04T10:15:47.8201123Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8201556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8202010Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8202399Z graph_break [] 2025-12-04T10:15:47.8202682Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8203717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml - 2025-12-04T10:15:47.8204780Z =========================== short test summary info ============================ 2025-12-04T10:15:47.8206865Z FAILED [0.1043s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8208870Z target: aten.mm.default 2025-12-04T10:15:47.8209168Z args[0]: TensorBox( 2025-12-04T10:15:47.8209435Z ReinterpretView( 2025-12-04T10:15:47.8209711Z StorageBox( 2025-12-04T10:15:47.8210394Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8211012Z ), 2025-12-04T10:15:47.8211435Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8211985Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8212308Z stack_traces = {, 2025-12-04T10:15:47.8212862Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8213493Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8213904Z , 2025-12-04T10:15:47.8214109Z } 2025-12-04T10:15:47.8214324Z ) 2025-12-04T10:15:47.8214539Z ) 2025-12-04T10:15:47.8214766Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8215378Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8215993Z )) 2025-12-04T10:15:47.8216112Z 2025-12-04T10:15:47.8216814Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8217653Z 2025-12-04T10:15:47.8217657Z 2025-12-04T10:15:47.8217868Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8218786Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8219487Z 2025-12-04T10:15:47.8219759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8220342Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:15:47.8220867Z ========= 1 failed, 11 passed, 3 skipped, 2 rerun in 180.82s (0:03:00) ========= 2025-12-04T10:15:47.8221320Z Got exit code 1 2025-12-04T10:15:47.8221579Z Retrying single test... 2025-12-04T10:15:47.8222189Z W1204 10:14:24.778000 35817 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8223354Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml 2025-12-04T10:15:47.8224246Z ============================= test session starts ============================== 2025-12-04T10:15:47.8224889Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:15:47.8225465Z cachedir: .pytest_cache 2025-12-04T10:15:47.8226154Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:15:47.8226919Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:15:47.8227248Z configfile: pytest.ini 2025-12-04T10:15:47.8227957Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:15:47.8228833Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T10:15:47.8229843Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8230732Z Running 1 items in this shard 2025-12-04T10:15:47.8230954Z 2025-12-04T10:15:47.8231867Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation W1204 10:14:29.226000 35817 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8233040Z ('RERUN', {'yellow': True}) [4.5730s] [100%] 2025-12-04T10:15:47.8233862Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1036s] [100%] 2025-12-04T10:15:47.8235022Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1038s] [100%] 2025-12-04T10:15:47.8235635Z 2025-12-04T10:15:47.8235838Z ==================================== RERUNS ==================================== 2025-12-04T10:15:47.8236405Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8236942Z Traceback (most recent call last): 2025-12-04T10:15:47.8237702Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8238472Z out = f(*inputs) 2025-12-04T10:15:47.8239120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8240021Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8240905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8241732Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8242621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8243395Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8244197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8245177Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8246149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8246907Z graph.run(*example_inputs) 2025-12-04T10:15:47.8247517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8248163Z return super().run(*args) 2025-12-04T10:15:47.8248750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8249401Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8250077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8250754Z result = super().run_node(n) 2025-12-04T10:15:47.8251378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8252102Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8252846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8253664Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8254481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8255277Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8256053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8256730Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8257406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8258100Z return autotune_select_algorithm( 2025-12-04T10:15:47.8258910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8259720Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8260430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8261305Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8263048Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8264537Z target: aten.mm.default 2025-12-04T10:15:47.8264820Z args[0]: TensorBox( 2025-12-04T10:15:47.8265099Z ReinterpretView( 2025-12-04T10:15:47.8265369Z StorageBox( 2025-12-04T10:15:47.8265912Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8266531Z ), 2025-12-04T10:15:47.8266957Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8267547Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8267888Z stack_traces = {, 2025-12-04T10:15:47.8268437Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8269060Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8269378Z , 2025-12-04T10:15:47.8269595Z } 2025-12-04T10:15:47.8269807Z ) 2025-12-04T10:15:47.8270001Z ) 2025-12-04T10:15:47.8270240Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8270851Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8271452Z )) 2025-12-04T10:15:47.8271583Z 2025-12-04T10:15:47.8272277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8273115Z 2025-12-04T10:15:47.8273120Z 2025-12-04T10:15:47.8273334Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8274260Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8274965Z 2025-12-04T10:15:47.8275225Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8275847Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8276309Z frames [('total', 1)] 2025-12-04T10:15:47.8276596Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8276913Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8277365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8277814Z graph_break [] 2025-12-04T10:15:47.8278077Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8278613Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8279157Z Traceback (most recent call last): 2025-12-04T10:15:47.8279919Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8280690Z out = f(*inputs) 2025-12-04T10:15:47.8281335Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8282253Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8283131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8283960Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8284785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8285566Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8286352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8287332Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8288295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8289047Z graph.run(*example_inputs) 2025-12-04T10:15:47.8289752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8290396Z return super().run(*args) 2025-12-04T10:15:47.8290992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8291628Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8292298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8292974Z result = super().run_node(n) 2025-12-04T10:15:47.8293663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8294383Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8295127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8295957Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8296768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8297571Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8298345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8299028Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8299686Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8300392Z return autotune_select_algorithm( 2025-12-04T10:15:47.8301344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8302148Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8302853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8303740Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8305385Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8306853Z target: aten.mm.default 2025-12-04T10:15:47.8307152Z args[0]: TensorBox( 2025-12-04T10:15:47.8307432Z ReinterpretView( 2025-12-04T10:15:47.8307707Z StorageBox( 2025-12-04T10:15:47.8308252Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8308873Z ), 2025-12-04T10:15:47.8309295Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8309825Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8310167Z stack_traces = {, 2025-12-04T10:15:47.8310719Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8311330Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8311657Z , 2025-12-04T10:15:47.8311871Z } 2025-12-04T10:15:47.8312070Z ) 2025-12-04T10:15:47.8312285Z ) 2025-12-04T10:15:47.8312525Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8313125Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8313744Z )) 2025-12-04T10:15:47.8313880Z 2025-12-04T10:15:47.8314577Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8315399Z 2025-12-04T10:15:47.8315403Z 2025-12-04T10:15:47.8315625Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8316655Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8317356Z 2025-12-04T10:15:47.8317617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8318240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8318699Z frames [('total', 1)] 2025-12-04T10:15:47.8318979Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8319310Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8319837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8320288Z graph_break [] 2025-12-04T10:15:47.8320549Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8321010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8321465Z frames [('total', 1)] 2025-12-04T10:15:47.8321737Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8322219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8322698Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8323014Z graph_break [] 2025-12-04T10:15:47.8323290Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8323690Z =================================== FAILURES =================================== 2025-12-04T10:15:47.8324241Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8324782Z Traceback (most recent call last): 2025-12-04T10:15:47.8325569Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8326346Z out = f(*inputs) 2025-12-04T10:15:47.8326982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8327836Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8328720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8329547Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8330349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8331125Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8331921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8332890Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8333857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8334617Z graph.run(*example_inputs) 2025-12-04T10:15:47.8335228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8335856Z return super().run(*args) 2025-12-04T10:15:47.8336456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8337109Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8337768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8338443Z result = super().run_node(n) 2025-12-04T10:15:47.8339088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8339811Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8340537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8341357Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8342248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8343055Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8343819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8344508Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8345180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8345924Z return autotune_select_algorithm( 2025-12-04T10:15:47.8346740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8347557Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8348261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8349254Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8350897Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8352374Z target: aten.mm.default 2025-12-04T10:15:47.8352664Z args[0]: TensorBox( 2025-12-04T10:15:47.8352928Z ReinterpretView( 2025-12-04T10:15:47.8353202Z StorageBox( 2025-12-04T10:15:47.8353755Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8354358Z ), 2025-12-04T10:15:47.8354779Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8355324Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8355652Z stack_traces = {, 2025-12-04T10:15:47.8356211Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8356837Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8357173Z , 2025-12-04T10:15:47.8357380Z } 2025-12-04T10:15:47.8357594Z ) 2025-12-04T10:15:47.8357812Z ) 2025-12-04T10:15:47.8358037Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8358650Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8359268Z )) 2025-12-04T10:15:47.8359384Z 2025-12-04T10:15:47.8360083Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8360914Z 2025-12-04T10:15:47.8360919Z 2025-12-04T10:15:47.8361132Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8362055Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8362821Z 2025-12-04T10:15:47.8363092Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8363712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8364161Z frames [('total', 1)] 2025-12-04T10:15:47.8364455Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8364788Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8365235Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8365687Z graph_break [] 2025-12-04T10:15:47.8365964Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8366421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8366874Z frames [('total', 1)] 2025-12-04T10:15:47.8367161Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8367660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8368139Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8368473Z graph_break [] 2025-12-04T10:15:47.8368752Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8369202Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8369651Z frames [('total', 1)] 2025-12-04T10:15:47.8369934Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8370343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8370867Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8371193Z graph_break [] 2025-12-04T10:15:47.8371469Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8372498Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml - 2025-12-04T10:15:47.8373591Z =========================== short test summary info ============================ 2025-12-04T10:15:47.8375679Z FAILED [0.1038s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8377699Z target: aten.mm.default 2025-12-04T10:15:47.8377981Z args[0]: TensorBox( 2025-12-04T10:15:47.8378253Z ReinterpretView( 2025-12-04T10:15:47.8378523Z StorageBox( 2025-12-04T10:15:47.8379055Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8379671Z ), 2025-12-04T10:15:47.8380086Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8380623Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8380950Z stack_traces = {, 2025-12-04T10:15:47.8381497Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8382119Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8382434Z , 2025-12-04T10:15:47.8382639Z } 2025-12-04T10:15:47.8382839Z ) 2025-12-04T10:15:47.8383031Z ) 2025-12-04T10:15:47.8383259Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8383865Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8384468Z )) 2025-12-04T10:15:47.8384591Z 2025-12-04T10:15:47.8385286Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8386120Z 2025-12-04T10:15:47.8386124Z 2025-12-04T10:15:47.8386334Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8387253Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8387955Z 2025-12-04T10:15:47.8388224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8388789Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:15:47.8389295Z ================== 1 failed, 17 deselected, 2 rerun in 4.81s =================== 2025-12-04T10:15:47.8389724Z Got exit code 1 2025-12-04T10:15:47.8389978Z Retrying single test... 2025-12-04T10:15:47.8390593Z W1204 10:14:43.472000 35986 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8391753Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml 2025-12-04T10:15:47.8392647Z ============================= test session starts ============================== 2025-12-04T10:15:47.8393374Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:15:47.8393957Z cachedir: .pytest_cache 2025-12-04T10:15:47.8394656Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:15:47.8395407Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:15:47.8395749Z configfile: pytest.ini 2025-12-04T10:15:47.8396459Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:15:47.8397385Z collecting ... collected 18 items / 17 deselected / 1 selected 2025-12-04T10:15:47.8398375Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8399269Z Running 1 items in this shard 2025-12-04T10:15:47.8399476Z 2025-12-04T10:15:47.8400402Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation W1204 10:14:47.897000 35986 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8401745Z ('RERUN', {'yellow': True}) [4.5480s] [100%] 2025-12-04T10:15:47.8402623Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1053s] [100%] 2025-12-04T10:15:47.8403980Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1042s] [100%] 2025-12-04T10:15:47.8404590Z 2025-12-04T10:15:47.8404744Z ==================================== RERUNS ==================================== 2025-12-04T10:15:47.8405308Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8405836Z Traceback (most recent call last): 2025-12-04T10:15:47.8406617Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8407388Z out = f(*inputs) 2025-12-04T10:15:47.8408020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8408873Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8409755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8410577Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8411384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8412160Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8412953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8413928Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8414881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8415636Z graph.run(*example_inputs) 2025-12-04T10:15:47.8416239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8416868Z return super().run(*args) 2025-12-04T10:15:47.8417459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8418115Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8418783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8419443Z result = super().run_node(n) 2025-12-04T10:15:47.8420075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8420929Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8421656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8422482Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8423294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8424093Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8424931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8425612Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8426279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8426976Z return autotune_select_algorithm( 2025-12-04T10:15:47.8427773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8428588Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8429291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8430151Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8431795Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8433287Z target: aten.mm.default 2025-12-04T10:15:47.8433578Z args[0]: TensorBox( 2025-12-04T10:15:47.8433835Z ReinterpretView( 2025-12-04T10:15:47.8434099Z StorageBox( 2025-12-04T10:15:47.8434653Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8435258Z ), 2025-12-04T10:15:47.8435676Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8436222Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8436565Z stack_traces = {, 2025-12-04T10:15:47.8437097Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8437713Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8438045Z , 2025-12-04T10:15:47.8438244Z } 2025-12-04T10:15:47.8438450Z ) 2025-12-04T10:15:47.8438658Z ) 2025-12-04T10:15:47.8438879Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8439493Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8440104Z )) 2025-12-04T10:15:47.8440220Z 2025-12-04T10:15:47.8440938Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8441760Z 2025-12-04T10:15:47.8441765Z 2025-12-04T10:15:47.8441980Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8442972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8443694Z 2025-12-04T10:15:47.8443956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8444585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8445030Z frames [('total', 1)] 2025-12-04T10:15:47.8445324Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8445659Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8446104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8446553Z graph_break [] 2025-12-04T10:15:47.8446899Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8447422Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8447965Z Traceback (most recent call last): 2025-12-04T10:15:47.8448742Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8449513Z out = f(*inputs) 2025-12-04T10:15:47.8450154Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8451068Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8451950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8452777Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8453595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8454374Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8455174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8456141Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8457111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8457877Z graph.run(*example_inputs) 2025-12-04T10:15:47.8458490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8459122Z return super().run(*args) 2025-12-04T10:15:47.8459718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8460373Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8461040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8461715Z result = super().run_node(n) 2025-12-04T10:15:47.8462353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8463073Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8463804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8464639Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8465459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8466263Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8467037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8467723Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8468394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8469085Z return autotune_select_algorithm( 2025-12-04T10:15:47.8469890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8470708Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8471412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8472276Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8474000Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8475492Z target: aten.mm.default 2025-12-04T10:15:47.8475795Z args[0]: TensorBox( 2025-12-04T10:15:47.8476058Z ReinterpretView( 2025-12-04T10:15:47.8476328Z StorageBox( 2025-12-04T10:15:47.8476880Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8477490Z ), 2025-12-04T10:15:47.8477915Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8478546Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8478872Z stack_traces = {, 2025-12-04T10:15:47.8479428Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8480056Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8480387Z , 2025-12-04T10:15:47.8480588Z } 2025-12-04T10:15:47.8480801Z ) 2025-12-04T10:15:47.8481017Z ) 2025-12-04T10:15:47.8481240Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8481855Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8482538Z )) 2025-12-04T10:15:47.8482657Z 2025-12-04T10:15:47.8483357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8484203Z 2025-12-04T10:15:47.8484207Z 2025-12-04T10:15:47.8484416Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8485338Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8486044Z 2025-12-04T10:15:47.8486319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8486935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8487393Z frames [('total', 1)] 2025-12-04T10:15:47.8487695Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8488029Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8488467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8488917Z graph_break [] 2025-12-04T10:15:47.8489197Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8489645Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8490100Z frames [('total', 1)] 2025-12-04T10:15:47.8490385Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8490797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8491270Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8491599Z graph_break [] 2025-12-04T10:15:47.8491884Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8492268Z =================================== FAILURES =================================== 2025-12-04T10:15:47.8492838Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________ 2025-12-04T10:15:47.8493380Z Traceback (most recent call last): 2025-12-04T10:15:47.8494145Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8494921Z out = f(*inputs) 2025-12-04T10:15:47.8495572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:15:47.8496438Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:15:47.8497309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:15:47.8498141Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:15:47.8499050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:15:47.8499825Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:15:47.8500628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:15:47.8501766Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:15:47.8502741Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:15:47.8503593Z graph.run(*example_inputs) 2025-12-04T10:15:47.8504205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:15:47.8504851Z return super().run(*args) 2025-12-04T10:15:47.8505453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:15:47.8506090Z self.env[node] = self.run_node(node) 2025-12-04T10:15:47.8506770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:15:47.8507452Z result = super().run_node(n) 2025-12-04T10:15:47.8508071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:15:47.8508788Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:15:47.8509527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:15:47.8510364Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:15:47.8511170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:15:47.8511969Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:15:47.8512754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:15:47.8513428Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:15:47.8514103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:15:47.8514804Z return autotune_select_algorithm( 2025-12-04T10:15:47.8515618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:15:47.8516426Z return cache(*args, **kwargs) 2025-12-04T10:15:47.8517134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:15:47.8518011Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:15:47.8519653Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8521139Z target: aten.mm.default 2025-12-04T10:15:47.8521436Z args[0]: TensorBox( 2025-12-04T10:15:47.8521713Z ReinterpretView( 2025-12-04T10:15:47.8521969Z StorageBox( 2025-12-04T10:15:47.8522588Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8523207Z ), 2025-12-04T10:15:47.8523628Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8524171Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8524514Z stack_traces = {, 2025-12-04T10:15:47.8525067Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8525683Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8526018Z , 2025-12-04T10:15:47.8526237Z } 2025-12-04T10:15:47.8526438Z ) 2025-12-04T10:15:47.8526655Z ) 2025-12-04T10:15:47.8526999Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8527605Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8528224Z )) 2025-12-04T10:15:47.8528343Z 2025-12-04T10:15:47.8529052Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8529877Z 2025-12-04T10:15:47.8529940Z 2025-12-04T10:15:47.8530165Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8531070Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8531785Z 2025-12-04T10:15:47.8532046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8532670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8533135Z frames [('total', 1)] 2025-12-04T10:15:47.8533411Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8533745Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8534194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8534636Z graph_break [] 2025-12-04T10:15:47.8534912Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8535378Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8535832Z frames [('total', 1)] 2025-12-04T10:15:47.8536105Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8536526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8536992Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8537311Z graph_break [] 2025-12-04T10:15:47.8537587Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8538044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:15:47.8538487Z frames [('total', 1)] 2025-12-04T10:15:47.8538772Z stats [('calls_captured', 2)] 2025-12-04T10:15:47.8539194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:15:47.8539650Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T10:15:47.8539981Z graph_break [] 2025-12-04T10:15:47.8540255Z aten_mm_info [('aten.mm_s97_2000_3000', 1)] 2025-12-04T10:15:47.8541283Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml - 2025-12-04T10:15:47.8542359Z =========================== short test summary info ============================ 2025-12-04T10:15:47.8544450Z FAILED [0.1042s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:15:47.8546454Z target: aten.mm.default 2025-12-04T10:15:47.8546756Z args[0]: TensorBox( 2025-12-04T10:15:47.8547021Z ReinterpretView( 2025-12-04T10:15:47.8547288Z StorageBox( 2025-12-04T10:15:47.8547833Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1])) 2025-12-04T10:15:47.8548439Z ), 2025-12-04T10:15:47.8548862Z FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000), 2025-12-04T10:15:47.8549414Z origins=OrderedSet([slice_1]), 2025-12-04T10:15:47.8549752Z stack_traces = {, 2025-12-04T10:15:47.8550292Z File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f, 2025-12-04T10:15:47.8550914Z x = torch.narrow(a, 1, K, K), 2025-12-04T10:15:47.8551244Z , 2025-12-04T10:15:47.8551447Z } 2025-12-04T10:15:47.8551661Z ) 2025-12-04T10:15:47.8551942Z ) 2025-12-04T10:15:47.8552173Z args[1]: TensorBox(StorageBox( 2025-12-04T10:15:47.8552793Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1])) 2025-12-04T10:15:47.8553409Z )) 2025-12-04T10:15:47.8553527Z 2025-12-04T10:15:47.8554227Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:15:47.8555130Z 2025-12-04T10:15:47.8555135Z 2025-12-04T10:15:47.8555343Z To execute this test, run the following from the base repo dir: 2025-12-04T10:15:47.8556269Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8556968Z 2025-12-04T10:15:47.8557245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:15:47.8557834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:15:47.8558344Z ================== 1 failed, 17 deselected, 2 rerun in 4.79s =================== 2025-12-04T10:15:47.8558786Z Got exit code 1 2025-12-04T10:15:47.8559455Z FAILED CONSISTENTLY: test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation 2025-12-04T10:15:47.8560480Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:15:47.8561458Z W1204 10:15:02.382000 36155 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:15:47.8562698Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml 2025-12-04T10:15:47.8563599Z ============================= test session starts ============================== 2025-12-04T10:15:47.8564242Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:15:47.8564837Z cachedir: .pytest_cache 2025-12-04T10:15:47.8565534Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:15:47.8566301Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:15:47.8566630Z configfile: pytest.ini 2025-12-04T10:15:47.8567337Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:15:47.8568213Z collecting ... collected 18 items / 15 deselected / 3 selected 2025-12-04T10:15:47.8568684Z stepcurrent: skipping 15 already run items. 2025-12-04T10:15:47.8569061Z Running 3 items in this shard 2025-12-04T10:15:47.8569264Z 2025-12-04T10:15:47.8569663Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_split_scan PASSED [17.2639s] [ 33%] 2025-12-04T10:15:47.8570557Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_star_dep PASSED [12.8722s] [ 66%] 2025-12-04T10:15:47.8571548Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_unused_input_bandwidth_computation PASSED [13.2386s] [100%] 2025-12-04T10:15:47.8572188Z 2025-12-04T10:15:47.8572964Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml - 2025-12-04T10:15:47.8574059Z ====================== 3 passed, 15 deselected in 43.40s ======================= 2025-12-04T10:15:47.8574986Z The following tests failed consistently: ['test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation'] 2025-12-04T10:15:47.8575723Z 2025-12-04T10:15:47.8576306Z FINISHED PRINTING LOG FILE of inductor/test_kernel_benchmark 1/1 (test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log) 2025-12-04T10:15:47.8577038Z 2025-12-04T10:15:47.8577410Z Finished inductor/test_kernel_benchmark 1/1 ... [2025-12-04 10:15:47.800709][3705.410615958], took 4.73min 2025-12-04T10:15:47.8578811Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml 2025-12-04T10:15:47.8825481Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml 2025-12-04T10:15:47.9127149Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml 2025-12-04T10:15:47.9467645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml 2025-12-04T10:15:48.3112572Z Uploading logs for 57119749427 to S3 2025-12-04T10:15:48.3552672Z Uploading artifacts took 0.37 seconds 2025-12-04T10:15:48.3553110Z inductor/test_kernel_benchmark 1/1 failed! 2025-12-04T10:15:48.3557703Z Running inductor/test_torchinductor_opinfo 3/17 ... [2025-12-04 10:15:48.355592][3705.965500361] 2025-12-04T10:15:48.3558392Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:15:48.3562916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=3', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:48.356027] 2025-12-04T10:23:55.6365138Z 2025-12-04T10:23:55.6366672Z inductor/test_torchinductor_opinfo 3/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_3.17_09d50cf3d15b8ee9_.log 2025-12-04T10:23:55.6499993Z Running 231 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hypot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float16 2025-12-04T10:23:55.6630499Z 2025-12-04T10:23:55.6630917Z Finished inductor/test_torchinductor_opinfo 3/17 ... [2025-12-04 10:23:55.636809][4193.246718126], took 8.12min 2025-12-04T10:23:55.6632346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.xml 2025-12-04T10:23:55.7251137Z Running inductor/test_torchinductor_opinfo 8/17 ... [2025-12-04 10:23:55.724806][4193.33471401] 2025-12-04T10:23:55.7251750Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:23:55.7254786Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=8', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:23:55.725192] 2025-12-04T10:34:25.0905132Z 2025-12-04T10:34:25.0906231Z inductor/test_torchinductor_opinfo 8/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_8.17_f4805f992a426064_.log 2025-12-04T10:34:25.1016390Z Running 190 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cond_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eig_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_unpack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_bartlett_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float64 2025-12-04T10:34:25.1124157Z 2025-12-04T10:34:25.1124576Z Finished inductor/test_torchinductor_opinfo 8/17 ... [2025-12-04 10:34:25.090526][4822.70043385], took 10.49min 2025-12-04T10:34:25.1126013Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.xml 2025-12-04T10:34:25.1780104Z Running inductor/test_torchinductor_opinfo 13/17 ... [2025-12-04 10:34:25.177707][4822.787615175] 2025-12-04T10:34:25.1780725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:34:25.1784180Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=13', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:34:25.178121] 2025-12-04T10:45:00.1141950Z 2025-12-04T10:45:00.1146092Z inductor/test_torchinductor_opinfo 13/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_13.17_50bb27b4d6383988_.log 2025-12-04T10:45:00.1269336Z Running 210 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_decomposed_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exponential_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32 2025-12-04T10:45:00.1389807Z 2025-12-04T10:45:00.1390222Z Finished inductor/test_torchinductor_opinfo 13/17 ... [2025-12-04 10:45:00.114285][5457.72419387], took 10.58min 2025-12-04T10:45:00.1391683Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.xml 2025-12-04T10:45:00.5384128Z Uploading artifacts took 0.34 seconds 2025-12-04T10:45:00.5388283Z Running inductor/test_pattern_matcher 1/1 ... [2025-12-04 10:45:00.538632][5458.148539749] 2025-12-04T10:45:00.5389046Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:45:00.5393309Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pattern_matcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:45:00.539059] 2025-12-04T10:46:59.4237778Z 2025-12-04T10:46:59.4239080Z PRINTING LOG FILE of inductor/test_pattern_matcher 1/1 (test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log) 2025-12-04T10:46:59.4240967Z W1204 10:45:09.196000 77296 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4242216Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml 2025-12-04T10:46:59.4243116Z ============================= test session starts ============================== 2025-12-04T10:46:59.4243794Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:46:59.4244387Z cachedir: .pytest_cache 2025-12-04T10:46:59.4245071Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:46:59.4245845Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:46:59.4247447Z configfile: pytest.ini 2025-12-04T10:46:59.4248195Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:46:59.4248971Z collecting ... collected 52 items 2025-12-04T10:46:59.4249377Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:46:59.4272178Z Running 52 items in this shard: test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_alpha_beta_with_pointwise, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_broadcasting_bias, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_dtype_mismatch, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_symbolic_scalar, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_bmm_to_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_bound_method_pattern_matcher, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_cuda, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_xpu, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_splitwithsizes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_duplicate_search, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_epilogue, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_gating, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fwd_only_generate_original_aten_meta, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_input_output_same, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations1, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations2, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations3, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_with_mutation, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_bad_cases, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_cpu, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_epi_works, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_exhaustive_dtypes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_gating, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mm_plus_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_multioutput_register_replacement, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mutation_op_matching, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_convert, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_cumsum, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair_3d, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair_dynamic_shapes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_noop_pass_with_remove_passes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_pointless_clones, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_replace_mul_zero, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_scaled_softmax, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_serialized_patterns_up_to_date, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_splitwithsizes_cat, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_stable_topological_sort, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case0, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case1, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case2, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_symint_pattern_matching, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unfuse_bias_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case0, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case1 2025-12-04T10:46:59.4295436Z 2025-12-04T10:46:59.4296229Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm W1204 10:45:14.863000 77296 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4297243Z PASSED [6.6532s] [ 1%] 2025-12-04T10:46:59.4297894Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_alpha_beta_with_pointwise PASSED [0.6349s] [ 3%] 2025-12-04T10:46:59.4298916Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_broadcasting_bias PASSED [0.1976s] [ 5%] 2025-12-04T10:46:59.4299894Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_dtype_mismatch PASSED [0.5110s] [ 7%] 2025-12-04T10:46:59.4301049Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_symbolic_scalar PASSED [0.7121s] [ 9%] 2025-12-04T10:46:59.4301948Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_bmm_to_mm PASSED [0.2353s] [ 11%] 2025-12-04T10:46:59.4302891Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_bound_method_pattern_matcher PASSED [1.2701s] [ 13%] 2025-12-04T10:46:59.4303840Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_addmm PASSED [0.1424s] [ 15%] 2025-12-04T10:46:59.4304672Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_mm PASSED [0.1369s] [ 17%] 2025-12-04T10:46:59.4305542Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_cuda PASSED [1.3428s] [ 19%] 2025-12-04T10:46:59.4306479Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_xpu PASSED [1.2430s] [ 21%] 2025-12-04T10:46:59.4307417Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_splitwithsizes PASSED [2.3046s] [ 23%] 2025-12-04T10:46:59.4308363Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_duplicate_search PASSED [0.1865s] [ 25%] 2025-12-04T10:46:59.4309402Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul SKIPPED [0.0003s] (templates require big gpu) [ 26%] 2025-12-04T10:46:59.4310622Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_epilogue SKIPPED [0.0002s] (templates require big gpu) [ 28%] 2025-12-04T10:46:59.4311875Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_gating SKIPPED [0.0002s] (templates require big gpu) [ 30%] 2025-12-04T10:46:59.4313041Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fwd_only_generate_original_aten_meta PASSED [0.0088s] [ 32%] 2025-12-04T10:46:59.4314045Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_input_output_same PASSED [0.7180s] [ 34%] 2025-12-04T10:46:59.4315073Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations1 PASSED [0.6273s] [ 36%] 2025-12-04T10:46:59.4316207Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations2 PASSED [0.5773s] [ 38%] 2025-12-04T10:46:59.4317336Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations3 PASSED [0.5795s] [ 40%] 2025-12-04T10:46:59.4318519Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_with_mutation PASSED [1.2711s] [ 42%] 2025-12-04T10:46:59.4319554Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm SKIPPED [0.0003s] (templates require big gpu) [ 44%] 2025-12-04T10:46:59.4320704Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_bad_cases SKIPPED [0.0002s] (templates require big gpu) [ 46%] 2025-12-04T10:46:59.4321805Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_cpu PASSED [0.7236s] [ 48%] 2025-12-04T10:46:59.4322835Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_epi_works SKIPPED [0.0003s] (templates require big gpu) [ 50%] 2025-12-04T10:46:59.4324191Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_exhaustive_dtypes SKIPPED [0.0002s] (templates require big gpu) [ 51%] 2025-12-04T10:46:59.4325423Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_gating SKIPPED [0.0002s] (templates require big gpu) [ 53%] 2025-12-04T10:46:59.4326450Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mm_plus_mm PASSED [0.7748s] [ 55%] 2025-12-04T10:46:59.4327409Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_multioutput_register_replacement PASSED [0.8276s] [ 57%] 2025-12-04T10:46:59.4328438Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mutation_op_matching PASSED [0.0048s] [ 59%] 2025-12-04T10:46:59.4329569Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.1245s] [ 61%] 2025-12-04T10:46:59.4330848Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0886s] [ 61%] 2025-12-04T10:46:59.4332021Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0865s] [ 61%] 2025-12-04T10:46:59.4332644Z 2025-12-04T10:46:59.4332783Z ==================================== RERUNS ==================================== 2025-12-04T10:46:59.4333358Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4333905Z Traceback (most recent call last): 2025-12-04T10:46:59.4334690Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4335523Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4336269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4336986Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4337668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4338523Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4339405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4340226Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4341050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4341832Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4342630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4343590Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4344562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4345322Z graph.run(*example_inputs) 2025-12-04T10:46:59.4345929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4346558Z return super().run(*args) 2025-12-04T10:46:59.4347232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4347891Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4348549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4349236Z result = super().run_node(n) 2025-12-04T10:46:59.4349882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4350609Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4351427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4352262Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4353095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4353884Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4354672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4355364Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4356042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4356731Z return autotune_select_algorithm( 2025-12-04T10:46:59.4357546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4358373Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4359079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4359948Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4361594Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4363168Z target: aten.mm.default 2025-12-04T10:46:59.4363485Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4364073Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4364677Z )) 2025-12-04T10:46:59.4364919Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4365505Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4366103Z )) 2025-12-04T10:46:59.4366246Z 2025-12-04T10:46:59.4366958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4367784Z 2025-12-04T10:46:59.4367788Z 2025-12-04T10:46:59.4368005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4368961Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4369702Z 2025-12-04T10:46:59.4369965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4370593Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4371043Z frames [('total', 1)] 2025-12-04T10:46:59.4371344Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4371779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4372468Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4373030Z graph_break [] 2025-12-04T10:46:59.4373308Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4373921Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4374457Z Traceback (most recent call last): 2025-12-04T10:46:59.4375260Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4376104Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4376858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4377562Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4378325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4379181Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4380058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4380885Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4381711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4382500Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4383283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4384257Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4385223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4385984Z graph.run(*example_inputs) 2025-12-04T10:46:59.4386581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4387223Z return super().run(*args) 2025-12-04T10:46:59.4387834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4388475Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4389155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4389830Z result = super().run_node(n) 2025-12-04T10:46:59.4390471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4391181Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4391933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4392763Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4393573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4394373Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4395148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4395832Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4396495Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4397197Z return autotune_select_algorithm( 2025-12-04T10:46:59.4398013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4398842Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4399535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4400418Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4402426Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4403935Z target: aten.mm.default 2025-12-04T10:46:59.4404241Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4404843Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4405450Z )) 2025-12-04T10:46:59.4405678Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4406362Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4406963Z )) 2025-12-04T10:46:59.4407083Z 2025-12-04T10:46:59.4407783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4408628Z 2025-12-04T10:46:59.4408633Z 2025-12-04T10:46:59.4408856Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4409796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4410520Z 2025-12-04T10:46:59.4410792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4411415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4411862Z frames [('total', 1)] 2025-12-04T10:46:59.4412160Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4412591Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4413257Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4413831Z graph_break [] 2025-12-04T10:46:59.4414102Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4414544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4414999Z frames [('total', 1)] 2025-12-04T10:46:59.4415287Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4415713Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4416378Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4416942Z graph_break [] 2025-12-04T10:46:59.4417209Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4417578Z =================================== FAILURES =================================== 2025-12-04T10:46:59.4418158Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4418703Z Traceback (most recent call last): 2025-12-04T10:46:59.4419496Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4420315Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4421060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4421773Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4422456Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4423312Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4424197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4425030Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4425837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4426621Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4427486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4428468Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4429427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4430188Z graph.run(*example_inputs) 2025-12-04T10:46:59.4430797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4431427Z return super().run(*args) 2025-12-04T10:46:59.4432146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4432807Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4433489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4434158Z result = super().run_node(n) 2025-12-04T10:46:59.4434805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4435532Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4436265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4437105Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4437929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4438743Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4439511Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4440205Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4440883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4441591Z return autotune_select_algorithm( 2025-12-04T10:46:59.4442468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4443298Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4444010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4444877Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4446523Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4448011Z target: aten.mm.default 2025-12-04T10:46:59.4448322Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4448913Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4449513Z )) 2025-12-04T10:46:59.4449753Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4450333Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4450927Z )) 2025-12-04T10:46:59.4451055Z 2025-12-04T10:46:59.4451755Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4452585Z 2025-12-04T10:46:59.4452589Z 2025-12-04T10:46:59.4452810Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4453751Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4454477Z 2025-12-04T10:46:59.4454738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4455457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4455914Z frames [('total', 1)] 2025-12-04T10:46:59.4456196Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4456629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4457312Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4457882Z graph_break [] 2025-12-04T10:46:59.4458143Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4458663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4459124Z frames [('total', 1)] 2025-12-04T10:46:59.4459402Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4459830Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4460506Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4461068Z graph_break [] 2025-12-04T10:46:59.4461337Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4461786Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4462241Z frames [('total', 1)] 2025-12-04T10:46:59.4462515Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4462941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4463620Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4464176Z graph_break [] 2025-12-04T10:46:59.4464446Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4465453Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml - 2025-12-04T10:46:59.4466526Z =========================== short test summary info ============================ 2025-12-04T10:46:59.4468622Z FAILED [0.0865s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4470645Z target: aten.mm.default 2025-12-04T10:46:59.4470959Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4471558Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4472147Z )) 2025-12-04T10:46:59.4472388Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4472977Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4473561Z )) 2025-12-04T10:46:59.4473690Z 2025-12-04T10:46:59.4474389Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4475225Z 2025-12-04T10:46:59.4475229Z 2025-12-04T10:46:59.4475441Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4476383Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4477107Z 2025-12-04T10:46:59.4477367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4477953Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:46:59.4478475Z ============== 1 failed, 23 passed, 8 skipped, 2 rerun in 22.06s =============== 2025-12-04T10:46:59.4478922Z Got exit code 1 2025-12-04T10:46:59.4479174Z Retrying single test... 2025-12-04T10:46:59.4479796Z W1204 10:45:42.090000 78135 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4481034Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml 2025-12-04T10:46:59.4482009Z ============================= test session starts ============================== 2025-12-04T10:46:59.4482647Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:46:59.4483236Z cachedir: .pytest_cache 2025-12-04T10:46:59.4483934Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:46:59.4484765Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:46:59.4485110Z configfile: pytest.ini 2025-12-04T10:46:59.4485818Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:46:59.4486687Z collecting ... collected 52 items / 51 deselected / 1 selected 2025-12-04T10:46:59.4487698Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4488611Z Running 1 items in this shard 2025-12-04T10:46:59.4488819Z 2025-12-04T10:46:59.4489753Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm W1204 10:45:46.380000 78135 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4490946Z ('RERUN', {'yellow': True}) [4.2298s] [100%] 2025-12-04T10:46:59.4491773Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0895s] [100%] 2025-12-04T10:46:59.4492974Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0887s] [100%] 2025-12-04T10:46:59.4493588Z 2025-12-04T10:46:59.4493746Z ==================================== RERUNS ==================================== 2025-12-04T10:46:59.4494322Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4494855Z Traceback (most recent call last): 2025-12-04T10:46:59.4495647Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4496479Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4497209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4497928Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4498619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4499475Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4500346Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4501353Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4502183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4502970Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4503761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4504738Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4505716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4506466Z graph.run(*example_inputs) 2025-12-04T10:46:59.4507080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4507728Z return super().run(*args) 2025-12-04T10:46:59.4508446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4509093Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4509773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4510455Z result = super().run_node(n) 2025-12-04T10:46:59.4511086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4511895Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4512637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4513472Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4514282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4515092Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4515871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4516561Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4517222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4517923Z return autotune_select_algorithm( 2025-12-04T10:46:59.4518735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4519547Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4520245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4521118Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4522842Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4524319Z target: aten.mm.default 2025-12-04T10:46:59.4524636Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4525240Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4525849Z )) 2025-12-04T10:46:59.4526075Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4526676Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4527277Z )) 2025-12-04T10:46:59.4527395Z 2025-12-04T10:46:59.4528096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4528936Z 2025-12-04T10:46:59.4528941Z 2025-12-04T10:46:59.4529155Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4530096Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4530817Z 2025-12-04T10:46:59.4531088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4531697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4532158Z frames [('total', 1)] 2025-12-04T10:46:59.4532447Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4532985Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4533657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4534106Z graph_break [] 2025-12-04T10:46:59.4534378Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4534968Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4535519Z Traceback (most recent call last): 2025-12-04T10:46:59.4536323Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4537162Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4537894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4538669Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4539363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4540203Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4541084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4541914Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4542733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4543502Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4544300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4545273Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4546243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4546993Z graph.run(*example_inputs) 2025-12-04T10:46:59.4547600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4548243Z return super().run(*args) 2025-12-04T10:46:59.4548834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4549488Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4550163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4550841Z result = super().run_node(n) 2025-12-04T10:46:59.4551470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4552195Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4552937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4553756Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4554577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4555380Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4556156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4556830Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4557505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4558208Z return autotune_select_algorithm( 2025-12-04T10:46:59.4559017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4559833Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4560545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4561428Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4563229Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4564716Z target: aten.mm.default 2025-12-04T10:46:59.4565031Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4565626Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4566217Z )) 2025-12-04T10:46:59.4566511Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4567100Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4567684Z )) 2025-12-04T10:46:59.4567812Z 2025-12-04T10:46:59.4568510Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4569347Z 2025-12-04T10:46:59.4569356Z 2025-12-04T10:46:59.4569567Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4570506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4571227Z 2025-12-04T10:46:59.4571502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4572112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4572575Z frames [('total', 1)] 2025-12-04T10:46:59.4572871Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4573405Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4574099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4574549Z graph_break [] 2025-12-04T10:46:59.4574825Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4587444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4587915Z frames [('total', 1)] 2025-12-04T10:46:59.4588197Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4588641Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4589331Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4589889Z graph_break [] 2025-12-04T10:46:59.4590140Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4590540Z =================================== FAILURES =================================== 2025-12-04T10:46:59.4591121Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4591654Z Traceback (most recent call last): 2025-12-04T10:46:59.4592457Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4593297Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4594045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4594745Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4595445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4596299Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4597175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4598012Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4598833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4599618Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4600533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4601803Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4602779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4603550Z graph.run(*example_inputs) 2025-12-04T10:46:59.4604150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4604917Z return super().run(*args) 2025-12-04T10:46:59.4605518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4606156Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4606835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4607513Z result = super().run_node(n) 2025-12-04T10:46:59.4608159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4608875Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4609623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4610454Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4611265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4612078Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4612857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4613546Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4614211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4614915Z return autotune_select_algorithm( 2025-12-04T10:46:59.4615730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4616541Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4617253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4618132Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4619777Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4621263Z target: aten.mm.default 2025-12-04T10:46:59.4621562Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4622169Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4622767Z )) 2025-12-04T10:46:59.4622993Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4623585Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4624186Z )) 2025-12-04T10:46:59.4624307Z 2025-12-04T10:46:59.4625011Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4625851Z 2025-12-04T10:46:59.4625855Z 2025-12-04T10:46:59.4626068Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4627014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4627741Z 2025-12-04T10:46:59.4628128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4628756Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4629207Z frames [('total', 1)] 2025-12-04T10:46:59.4629503Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4630049Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4630726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4631234Z graph_break [] 2025-12-04T10:46:59.4631506Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4631946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4632400Z frames [('total', 1)] 2025-12-04T10:46:59.4632686Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4633110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4633789Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4634357Z graph_break [] 2025-12-04T10:46:59.4634627Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4635064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4635517Z frames [('total', 1)] 2025-12-04T10:46:59.4635802Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4636213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4636882Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4637448Z graph_break [] 2025-12-04T10:46:59.4637718Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4638720Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml - 2025-12-04T10:46:59.4639792Z =========================== short test summary info ============================ 2025-12-04T10:46:59.4641975Z FAILED [0.0887s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4644002Z target: aten.mm.default 2025-12-04T10:46:59.4644303Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4644909Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4645510Z )) 2025-12-04T10:46:59.4645735Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4646331Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4646930Z )) 2025-12-04T10:46:59.4647050Z 2025-12-04T10:46:59.4647760Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4648586Z 2025-12-04T10:46:59.4648591Z 2025-12-04T10:46:59.4648802Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4649743Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4650483Z 2025-12-04T10:46:59.4650753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4651335Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:46:59.4651839Z ================== 1 failed, 51 deselected, 2 rerun in 4.44s =================== 2025-12-04T10:46:59.4652278Z Got exit code 1 2025-12-04T10:46:59.4652542Z Retrying single test... 2025-12-04T10:46:59.4653345Z W1204 10:46:00.368000 78304 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4654493Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml 2025-12-04T10:46:59.4655388Z ============================= test session starts ============================== 2025-12-04T10:46:59.4656042Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:46:59.4656623Z cachedir: .pytest_cache 2025-12-04T10:46:59.4657381Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:46:59.4658148Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:46:59.4658495Z configfile: pytest.ini 2025-12-04T10:46:59.4659192Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:46:59.4660072Z collecting ... collected 52 items / 51 deselected / 1 selected 2025-12-04T10:46:59.4661099Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4662015Z Running 1 items in this shard 2025-12-04T10:46:59.4662221Z 2025-12-04T10:46:59.4663139Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm W1204 10:46:04.679000 78304 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4664317Z ('RERUN', {'yellow': True}) [4.2499s] [100%] 2025-12-04T10:46:59.4665150Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0902s] [100%] 2025-12-04T10:46:59.4666342Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0875s] [100%] 2025-12-04T10:46:59.4666956Z 2025-12-04T10:46:59.4667100Z ==================================== RERUNS ==================================== 2025-12-04T10:46:59.4667669Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4668211Z Traceback (most recent call last): 2025-12-04T10:46:59.4668993Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4669821Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4670560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4671277Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4671954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4672809Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4673695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4674523Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4675330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4676107Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4676902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4677881Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4678835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4679596Z graph.run(*example_inputs) 2025-12-04T10:46:59.4680205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4680901Z return super().run(*args) 2025-12-04T10:46:59.4681502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4682230Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4682914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4683583Z result = super().run_node(n) 2025-12-04T10:46:59.4684228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4685020Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4685749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4686578Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4687408Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4688206Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4688969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4689659Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4690331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4691019Z return autotune_select_algorithm( 2025-12-04T10:46:59.4691834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4692653Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4693360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4694229Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4695869Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4697357Z target: aten.mm.default 2025-12-04T10:46:59.4697667Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4698248Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4698857Z )) 2025-12-04T10:46:59.4699099Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4699678Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4700277Z )) 2025-12-04T10:46:59.4700407Z 2025-12-04T10:46:59.4701284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4702119Z 2025-12-04T10:46:59.4702123Z 2025-12-04T10:46:59.4702347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4703292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4704015Z 2025-12-04T10:46:59.4704277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4704907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4705364Z frames [('total', 1)] 2025-12-04T10:46:59.4705647Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4706192Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4706878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4707327Z graph_break [] 2025-12-04T10:46:59.4707716Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4708244Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4708789Z Traceback (most recent call last): 2025-12-04T10:46:59.4709566Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4710399Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4711139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4711958Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4712637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4713492Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4714382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4715198Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4716014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4716796Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4717593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4718568Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4719523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4720287Z graph.run(*example_inputs) 2025-12-04T10:46:59.4720897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4721548Z return super().run(*args) 2025-12-04T10:46:59.4722201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4722857Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4723535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4724197Z result = super().run_node(n) 2025-12-04T10:46:59.4724840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4725572Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4726316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4727128Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4727952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4728750Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4729529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4730203Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4730875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4731577Z return autotune_select_algorithm( 2025-12-04T10:46:59.4732378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4733199Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4733903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4734783Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4736527Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4738015Z target: aten.mm.default 2025-12-04T10:46:59.4738335Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4738942Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4740097Z )) 2025-12-04T10:46:59.4740340Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4740935Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4741520Z )) 2025-12-04T10:46:59.4741653Z 2025-12-04T10:46:59.4742354Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4743190Z 2025-12-04T10:46:59.4743195Z 2025-12-04T10:46:59.4743403Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4744343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4745064Z 2025-12-04T10:46:59.4745337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4745951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4746408Z frames [('total', 1)] 2025-12-04T10:46:59.4746699Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4747229Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4747913Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4748362Z graph_break [] 2025-12-04T10:46:59.4748626Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4749078Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4749528Z frames [('total', 1)] 2025-12-04T10:46:59.4749816Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4750227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4750906Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4751470Z graph_break [] 2025-12-04T10:46:59.4751736Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4752116Z =================================== FAILURES =================================== 2025-12-04T10:46:59.4752688Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________ 2025-12-04T10:46:59.4753220Z Traceback (most recent call last): 2025-12-04T10:46:59.4754020Z File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4754860Z ret, code = run_and_get_code(opt_fn, *args) 2025-12-04T10:46:59.4755603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:46:59.4756299Z result = fn(*args, **kwargs) 2025-12-04T10:46:59.4756985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper 2025-12-04T10:46:59.4757838Z raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 2025-12-04T10:46:59.4758927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner 2025-12-04T10:46:59.4759752Z raise InductorError(e, currentframe()).with_traceback( 2025-12-04T10:46:59.4760573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner 2025-12-04T10:46:59.4761353Z mb_compiled_graph = fx_codegen_and_compile( 2025-12-04T10:46:59.4762307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile 2025-12-04T10:46:59.4763286Z return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) 2025-12-04T10:46:59.4764252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile 2025-12-04T10:46:59.4765014Z graph.run(*example_inputs) 2025-12-04T10:46:59.4765609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run 2025-12-04T10:46:59.4766324Z return super().run(*args) 2025-12-04T10:46:59.4766924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run 2025-12-04T10:46:59.4767582Z self.env[node] = self.run_node(node) 2025-12-04T10:46:59.4768250Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node 2025-12-04T10:46:59.4768937Z result = super().run_node(n) 2025-12-04T10:46:59.4769582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node 2025-12-04T10:46:59.4770295Z return getattr(self, n.op)(n.target, args, kwargs) 2025-12-04T10:46:59.4771035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function 2025-12-04T10:46:59.4771862Z raise LoweringException(e, target, args, kwargs).with_traceback( 2025-12-04T10:46:59.4772688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function 2025-12-04T10:46:59.4773471Z out = lowerings[target](*args, **kwargs) # type: ignore[index] 2025-12-04T10:46:59.4774247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped 2025-12-04T10:46:59.4774935Z out = decomp_fn(*args, **kwargs) 2025-12-04T10:46:59.4775601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm 2025-12-04T10:46:59.4776313Z return autotune_select_algorithm( 2025-12-04T10:46:59.4777127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm 2025-12-04T10:46:59.4777951Z return cache(*args, **kwargs) 2025-12-04T10:46:59.4778643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__ 2025-12-04T10:46:59.4779530Z raise self.create_no_valid_choices(name, "No choices exist for backend.") 2025-12-04T10:46:59.4781326Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4782818Z target: aten.mm.default 2025-12-04T10:46:59.4783126Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4783735Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4784337Z )) 2025-12-04T10:46:59.4784563Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4785150Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4785746Z )) 2025-12-04T10:46:59.4785861Z 2025-12-04T10:46:59.4786570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4787401Z 2025-12-04T10:46:59.4787406Z 2025-12-04T10:46:59.4787616Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4788562Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4789384Z 2025-12-04T10:46:59.4789648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4790268Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4790720Z frames [('total', 1)] 2025-12-04T10:46:59.4791008Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4791672Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4792406Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4792920Z graph_break [] 2025-12-04T10:46:59.4793192Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4793648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4794093Z frames [('total', 1)] 2025-12-04T10:46:59.4794381Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4794802Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4795475Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4796044Z graph_break [] 2025-12-04T10:46:59.4796313Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4796751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:46:59.4797207Z frames [('total', 1)] 2025-12-04T10:46:59.4797494Z stats [('calls_captured', 2)] 2025-12-04T10:46:59.4797924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:46:59.4798597Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)] 2025-12-04T10:46:59.4799163Z graph_break [] 2025-12-04T10:46:59.4799434Z aten_mm_info [('aten.mm_16_32_24', 1)] 2025-12-04T10:46:59.4800427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml - 2025-12-04T10:46:59.4801663Z =========================== short test summary info ============================ 2025-12-04T10:46:59.4803835Z FAILED [0.0875s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 2025-12-04T10:46:59.4805856Z target: aten.mm.default 2025-12-04T10:46:59.4806179Z args[0]: TensorBox(StorageBox( 2025-12-04T10:46:59.4806768Z InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1])) 2025-12-04T10:46:59.4807366Z )) 2025-12-04T10:46:59.4807604Z args[1]: TensorBox(StorageBox( 2025-12-04T10:46:59.4808176Z InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1])) 2025-12-04T10:46:59.4808772Z )) 2025-12-04T10:46:59.4808895Z 2025-12-04T10:46:59.4809604Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:46:59.4810431Z 2025-12-04T10:46:59.4810435Z 2025-12-04T10:46:59.4810659Z To execute this test, run the following from the base repo dir: 2025-12-04T10:46:59.4811586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4812329Z 2025-12-04T10:46:59.4812594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:46:59.4813178Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:46:59.4813691Z ================== 1 failed, 51 deselected, 2 rerun in 4.46s =================== 2025-12-04T10:46:59.4814114Z Got exit code 1 2025-12-04T10:46:59.4814945Z FAILED CONSISTENTLY: test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm 2025-12-04T10:46:59.4815998Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:46:59.4816960Z W1204 10:46:18.869000 78473 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4818116Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml 2025-12-04T10:46:59.4819108Z ============================= test session starts ============================== 2025-12-04T10:46:59.4819764Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:46:59.4820340Z cachedir: .pytest_cache 2025-12-04T10:46:59.4821037Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:46:59.4821813Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:46:59.4822161Z configfile: pytest.ini 2025-12-04T10:46:59.4822863Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:46:59.4823747Z collecting ... collected 52 items / 32 deselected / 20 selected 2025-12-04T10:46:59.4824244Z stepcurrent: skipping 32 already run items. 2025-12-04T10:46:59.4824616Z Running 20 items in this shard 2025-12-04T10:46:59.4824837Z 2025-12-04T10:46:59.4825246Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_convert PASSED [1.8224s] [ 5%] 2025-12-04T10:46:59.4826189Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_cumsum PASSED [7.4792s] [ 10%] 2025-12-04T10:46:59.4827149Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair PASSED [0.0165s] [ 15%] 2025-12-04T10:46:59.4828135Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair_3d PASSED [0.0142s] [ 20%] 2025-12-04T10:46:59.4829118Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair PASSED [0.0214s] [ 25%] 2025-12-04T10:46:59.4830145Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair_dynamic_shapes PASSED [0.1950s] [ 30%] 2025-12-04T10:46:59.4831240Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_noop_pass_with_remove_passes PASSED [0.2867s] [ 35%] 2025-12-04T10:46:59.4832272Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_pointless_clones PASSED [0.1647s] [ 40%] 2025-12-04T10:46:59.4833236Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_replace_mul_zero PASSED [0.1087s] [ 45%] 2025-12-04T10:46:59.4834158Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_scaled_softmax PASSED [10.5313s] [ 50%] 2025-12-04T10:46:59.4835146Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_serialized_patterns_up_to_date PASSED [10.1510s] [ 55%] 2025-12-04T10:46:59.4836145Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_splitwithsizes_cat PASSED [1.4169s] [ 60%] 2025-12-04T10:46:59.4837118Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_stable_topological_sort PASSED [0.0040s] [ 65%] 2025-12-04T10:46:59.4838141Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case0 PASSED [0.6030s] [ 70%] 2025-12-04T10:46:59.4839204Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case1 PASSED [0.5920s] [ 75%] 2025-12-04T10:46:59.4840247Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case2 PASSED [0.6164s] [ 80%] 2025-12-04T10:46:59.4841275Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_symint_pattern_matching PASSED [0.8794s] [ 85%] 2025-12-04T10:46:59.4842769Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unfuse_bias_addmm W1204 10:46:55.000000 78473 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:46:59.4843924Z PASSED [2.1539s] [ 90%] 2025-12-04T10:46:59.4844570Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case0 PASSED [0.4964s] [ 95%] 2025-12-04T10:46:59.4845656Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case1 PASSED [0.6324s] [100%] 2025-12-04T10:46:59.4846262Z 2025-12-04T10:46:59.4847058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml - 2025-12-04T10:46:59.4848204Z ====================== 20 passed, 32 deselected in 38.24s ====================== 2025-12-04T10:46:59.4849135Z The following tests failed consistently: ['test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm'] 2025-12-04T10:46:59.4849900Z 2025-12-04T10:46:59.4850475Z FINISHED PRINTING LOG FILE of inductor/test_pattern_matcher 1/1 (test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log) 2025-12-04T10:46:59.4851202Z 2025-12-04T10:46:59.4851565Z Finished inductor/test_pattern_matcher 1/1 ... [2025-12-04 10:46:59.424124][5577.034031169], took 1.98min 2025-12-04T10:46:59.4852929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml 2025-12-04T10:46:59.4995519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml 2025-12-04T10:46:59.5298545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml 2025-12-04T10:46:59.5630593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml 2025-12-04T10:46:59.8902850Z Uploading logs for 57119749427 to S3 2025-12-04T10:46:59.9364511Z Uploading artifacts took 0.35 seconds 2025-12-04T10:46:59.9364919Z inductor/test_pattern_matcher 1/1 failed! 2025-12-04T10:46:59.9369553Z Running inductor/test_cuda_repro 1/1 ... [2025-12-04 10:46:59.936798][5577.546706283] 2025-12-04T10:46:59.9370110Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:46:59.9374583Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:46:59.937243] 2025-12-04T10:52:00.8857617Z 2025-12-04T10:52:00.8858608Z PRINTING LOG FILE of inductor/test_cuda_repro 1/1 (test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log) 2025-12-04T10:52:00.8859976Z W1204 10:47:08.872000 79656 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.8861656Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml 2025-12-04T10:52:00.8862834Z ============================= test session starts ============================== 2025-12-04T10:52:00.8863715Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:00.8864663Z cachedir: .pytest_cache 2025-12-04T10:52:00.8865446Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:00.8866611Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:00.8867111Z configfile: pytest.ini 2025-12-04T10:52:00.8867983Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:00.8868961Z collecting ... collected 96 items 2025-12-04T10:52:00.8869615Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:52:00.8921228Z Running 96 items in this shard: test/inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling, test/inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1, test/inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16, test/inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_backward_context, test/inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue, test/inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms, test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses, test/inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned, test/inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts, test/inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic, test/inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants, test/inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu, test/inductor/test_cuda_repro.py::CudaReproTests::test_full_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_identity_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask, test/inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue100806, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103461, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103481, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue104759, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924, test/inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size, test/inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd, test/inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor, test/inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile, test/inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor, test/inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices, test/inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one, test/inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device, test/inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion, test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile, test/inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch, test/inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order, test/inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address, test/inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims, test/inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed, test/inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop, test/inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328, test/inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro 2025-12-04T10:52:00.8960292Z 2025-12-04T10:52:00.8960625Z inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling PASSED [3.2406s] [ 1%] 2025-12-04T10:52:00.8961784Z inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1 W1204 10:47:14.743000 79656 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:00.8963237Z W1204 10:47:15.243000 79656 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.8963917Z PASSED [2.7562s] [ 2%] 2025-12-04T10:52:00.8964577Z inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248 PASSED [3.1706s] [ 3%] 2025-12-04T10:52:00.8965751Z inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16 SKIPPED [0.0003s] (bfloat16 atomic add is only supported in fbcode today #97016) [ 4%] 2025-12-04T10:52:00.8966880Z inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel PASSED [0.1097s] [ 5%] 2025-12-04T10:52:00.8967734Z inductor/test_cuda_repro.py::CudaReproTests::test_backward_context PASSED [0.5631s] [ 6%] 2025-12-04T10:52:00.8968597Z inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision PASSED [0.5244s] [ 7%] 2025-12-04T10:52:00.8969486Z inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense PASSED [0.9228s] [ 8%] 2025-12-04T10:52:00.8970635Z inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 9%] 2025-12-04T10:52:00.8971779Z inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel PASSED [0.9067s] [ 10%] 2025-12-04T10:52:00.8972576Z inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index PASSED [0.9723s] [ 11%] 2025-12-04T10:52:00.8973395Z inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms PASSED [0.5192s] [ 12%] 2025-12-04T10:52:00.8974393Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses ('RERUN', {'yellow': True}) [1.3094s] [ 13%] 2025-12-04T10:52:00.8975530Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck! 2025-12-04T10:52:00.8976277Z FileCheck checks: 2025-12-04T10:52:00.8976549Z CHECK-NOT: in_out 2025-12-04T10:52:00.8976827Z ('RERUN', {'yellow': True}) [1.1625s] [ 13%] 2025-12-04T10:52:00.8977589Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck! 2025-12-04T10:52:00.8978334Z FileCheck checks: 2025-12-04T10:52:00.8978581Z CHECK-NOT: in_out 2025-12-04T10:52:00.8978960Z FAILED [1.1610s] [ 13%]You have not run this instance of FileCheck! 2025-12-04T10:52:00.8979402Z FileCheck checks: 2025-12-04T10:52:00.8979649Z CHECK-NOT: in_out 2025-12-04T10:52:00.8979818Z 2025-12-04T10:52:00.8979823Z 2025-12-04T10:52:00.8979965Z ==================================== RERUNS ==================================== 2025-12-04T10:52:00.8980524Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.8981048Z Traceback (most recent call last): 2025-12-04T10:52:00.8981780Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.8982568Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.8982962Z IndexError: list index out of range 2025-12-04T10:52:00.8983193Z 2025-12-04T10:52:00.8983404Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.8984264Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.8984917Z 2025-12-04T10:52:00.8985179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.8985801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.8986257Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.8987603Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.8988065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.8988771Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.8989338Z graph_break [] 2025-12-04T10:52:00.8989623Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.8990097Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.8991158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.8992225Z warnings.warn( 2025-12-04T10:52:00.8992656Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.8993186Z Traceback (most recent call last): 2025-12-04T10:52:00.8993919Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.8994707Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.8995105Z IndexError: list index out of range 2025-12-04T10:52:00.8995334Z 2025-12-04T10:52:00.8995543Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.8996396Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.8997042Z 2025-12-04T10:52:00.8997300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.8997920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.8998375Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.8998705Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.8999138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.8999818Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9000393Z graph_break [] 2025-12-04T10:52:00.9000678Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9001371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9002524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9003478Z warnings.warn( 2025-12-04T10:52:00.9003854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9004315Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9004654Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9005092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9005787Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9006354Z graph_break [] 2025-12-04T10:52:00.9006646Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9007115Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9008169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9009114Z warnings.warn( 2025-12-04T10:52:00.9009419Z =================================== FAILURES =================================== 2025-12-04T10:52:00.9009973Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9010483Z Traceback (most recent call last): 2025-12-04T10:52:00.9011224Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9012009Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9012388Z IndexError: list index out of range 2025-12-04T10:52:00.9012632Z 2025-12-04T10:52:00.9013000Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9013867Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9014506Z 2025-12-04T10:52:00.9014787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9015398Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9015869Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9016302Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9016720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9017416Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9017992Z graph_break [] 2025-12-04T10:52:00.9018281Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9018747Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9019824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9020774Z warnings.warn( 2025-12-04T10:52:00.9021139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9021605Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9021937Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9022373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9023049Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9023621Z graph_break [] 2025-12-04T10:52:00.9023904Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9024362Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9025426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9026365Z warnings.warn( 2025-12-04T10:52:00.9026733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9027184Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9027511Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9027935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9028615Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9029186Z graph_break [] 2025-12-04T10:52:00.9029475Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9029932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9031002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9031950Z warnings.warn( 2025-12-04T10:52:00.9032818Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml - 2025-12-04T10:52:00.9033817Z =========================== short test summary info ============================ 2025-12-04T10:52:00.9034677Z FAILED [1.1610s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range 2025-12-04T10:52:00.9035365Z 2025-12-04T10:52:00.9035578Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9036433Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9037073Z 2025-12-04T10:52:00.9037337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9037988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:00.9038511Z ============== 1 failed, 10 passed, 2 skipped, 2 rerun in 17.38s =============== 2025-12-04T10:52:00.9038956Z Got exit code 1 2025-12-04T10:52:00.9039207Z Retrying single test... 2025-12-04T10:52:00.9039830Z W1204 10:47:39.481000 80332 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9040935Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml 2025-12-04T10:52:00.9041817Z ============================= test session starts ============================== 2025-12-04T10:52:00.9042542Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:00.9043136Z cachedir: .pytest_cache 2025-12-04T10:52:00.9043832Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:00.9044594Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:00.9044949Z configfile: pytest.ini 2025-12-04T10:52:00.9045665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:00.9046525Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:00.9047463Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9048300Z Running 1 items in this shard 2025-12-04T10:52:00.9048505Z 2025-12-04T10:52:00.9049352Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses W1204 10:47:44.243000 80332 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9050459Z You have not run this instance of FileCheck! 2025-12-04T10:52:00.9050813Z FileCheck checks: 2025-12-04T10:52:00.9051078Z CHECK-NOT: in_out 2025-12-04T10:52:00.9051366Z ('RERUN', {'yellow': True}) [4.7791s] [100%] 2025-12-04T10:52:00.9052101Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses ('RERUN', {'yellow': True}) [1.1572s] [100%] 2025-12-04T10:52:00.9053243Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck! 2025-12-04T10:52:00.9053985Z FileCheck checks: 2025-12-04T10:52:00.9054235Z CHECK-NOT: in_out 2025-12-04T10:52:00.9054496Z FAILED [1.1580s] [100%] 2025-12-04T10:52:00.9054668Z 2025-12-04T10:52:00.9054820Z ==================================== RERUNS ==================================== 2025-12-04T10:52:00.9055373Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9055884Z Traceback (most recent call last): 2025-12-04T10:52:00.9056622Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9057410Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9057790Z IndexError: list index out of range 2025-12-04T10:52:00.9058033Z 2025-12-04T10:52:00.9058243Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9059093Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9059729Z 2025-12-04T10:52:00.9060000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9060610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9061076Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9061405Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9061942Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9062633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9063175Z graph_break [] 2025-12-04T10:52:00.9063468Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9063932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9065007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9065962Z warnings.warn( 2025-12-04T10:52:00.9066375Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9066970Z Traceback (most recent call last): 2025-12-04T10:52:00.9067706Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9068488Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9068868Z IndexError: list index out of range 2025-12-04T10:52:00.9069110Z 2025-12-04T10:52:00.9069325Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9070175Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9070811Z 2025-12-04T10:52:00.9071083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9071685Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9072149Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9072481Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9073023Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9073732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9074185Z graph_break [] 2025-12-04T10:52:00.9074479Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9074939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9076015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9076965Z warnings.warn( 2025-12-04T10:52:00.9077326Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9077795Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9078130Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9078560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9079246Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9079824Z graph_break [] 2025-12-04T10:52:00.9080114Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9080576Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9081654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9082679Z warnings.warn( 2025-12-04T10:52:00.9082990Z =================================== FAILURES =================================== 2025-12-04T10:52:00.9083535Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9084062Z Traceback (most recent call last): 2025-12-04T10:52:00.9084806Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9085590Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9085985Z IndexError: list index out of range 2025-12-04T10:52:00.9086212Z 2025-12-04T10:52:00.9086433Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9087281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9087922Z 2025-12-04T10:52:00.9088309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9088927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9089397Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9089714Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9090264Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9090956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9091530Z graph_break [] 2025-12-04T10:52:00.9091801Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9092280Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9093353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9094290Z warnings.warn( 2025-12-04T10:52:00.9094667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9095132Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9095465Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9095881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9096570Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9097144Z graph_break [] 2025-12-04T10:52:00.9097421Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9097891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9098957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9099903Z warnings.warn( 2025-12-04T10:52:00.9100266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9100731Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9101232Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9101653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9102353Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9102934Z graph_break [] 2025-12-04T10:52:00.9103209Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9103683Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9104749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9105695Z warnings.warn( 2025-12-04T10:52:00.9106549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml - 2025-12-04T10:52:00.9107561Z =========================== short test summary info ============================ 2025-12-04T10:52:00.9108417Z FAILED [1.1580s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range 2025-12-04T10:52:00.9109086Z 2025-12-04T10:52:00.9109312Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9110145Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9110798Z 2025-12-04T10:52:00.9111058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9111635Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:00.9112148Z ================== 1 failed, 95 deselected, 2 rerun in 7.13s =================== 2025-12-04T10:52:00.9112615Z You have not run this instance of FileCheck! 2025-12-04T10:52:00.9112983Z FileCheck checks: 2025-12-04T10:52:00.9113376Z CHECK-NOT: in_out 2025-12-04T10:52:00.9113626Z Got exit code 1 2025-12-04T10:52:00.9113888Z Retrying single test... 2025-12-04T10:52:00.9114510Z W1204 10:47:59.298000 80502 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9115593Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml 2025-12-04T10:52:00.9116419Z ============================= test session starts ============================== 2025-12-04T10:52:00.9117148Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:00.9117736Z cachedir: .pytest_cache 2025-12-04T10:52:00.9118412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:00.9119173Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:00.9119519Z configfile: pytest.ini 2025-12-04T10:52:00.9120236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:00.9121097Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:00.9122031Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9122935Z Running 1 items in this shard 2025-12-04T10:52:00.9123142Z 2025-12-04T10:52:00.9124005Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses W1204 10:48:04.081000 80502 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9125095Z ('RERUN', {'yellow': True}) [4.7899s] [100%] 2025-12-04T10:52:00.9125864Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck! 2025-12-04T10:52:00.9126612Z FileCheck checks: 2025-12-04T10:52:00.9126866Z CHECK-NOT: in_out 2025-12-04T10:52:00.9127155Z ('RERUN', {'yellow': True}) [1.1617s] [100%] 2025-12-04T10:52:00.9127920Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck! 2025-12-04T10:52:00.9128661Z FileCheck checks: 2025-12-04T10:52:00.9128908Z CHECK-NOT: in_out 2025-12-04T10:52:00.9129167Z FAILED [1.1652s] [100%] 2025-12-04T10:52:00.9129338Z 2025-12-04T10:52:00.9129490Z ==================================== RERUNS ==================================== 2025-12-04T10:52:00.9130031Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9130554Z Traceback (most recent call last): 2025-12-04T10:52:00.9131298Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9132074Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9132475Z IndexError: list index out of range 2025-12-04T10:52:00.9132719Z 2025-12-04T10:52:00.9132930Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9133788Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9134430Z 2025-12-04T10:52:00.9134691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9135316Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9135792Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9136130Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9136677Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9137376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9137830Z graph_break [] 2025-12-04T10:52:00.9138099Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9138709Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9139788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9140742Z warnings.warn( 2025-12-04T10:52:00.9141157Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9158834Z Traceback (most recent call last): 2025-12-04T10:52:00.9159890Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9160694Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9161105Z IndexError: list index out of range 2025-12-04T10:52:00.9161340Z 2025-12-04T10:52:00.9161555Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9162508Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9163151Z 2025-12-04T10:52:00.9163432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9164067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9164535Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9164868Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9165429Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9166118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9166575Z graph_break [] 2025-12-04T10:52:00.9166860Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9167322Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9168394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9169325Z warnings.warn( 2025-12-04T10:52:00.9169694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9170142Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9170467Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9170881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9171554Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9172132Z graph_break [] 2025-12-04T10:52:00.9172405Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9172861Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9173924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9174869Z warnings.warn( 2025-12-04T10:52:00.9175176Z =================================== FAILURES =================================== 2025-12-04T10:52:00.9175714Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________ 2025-12-04T10:52:00.9176241Z Traceback (most recent call last): 2025-12-04T10:52:00.9176984Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9177759Z FileCheck().check_not("in_out").run(code[0]) 2025-12-04T10:52:00.9178151Z IndexError: list index out of range 2025-12-04T10:52:00.9178391Z 2025-12-04T10:52:00.9178602Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9179455Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9180091Z 2025-12-04T10:52:00.9180351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9181058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9181530Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9181865Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9182407Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9183106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9183563Z graph_break [] 2025-12-04T10:52:00.9183837Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9184377Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9185453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9186405Z warnings.warn( 2025-12-04T10:52:00.9186768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9187232Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9187563Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9187979Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9188672Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9189243Z graph_break [] 2025-12-04T10:52:00.9189516Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9189989Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9191063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9192010Z warnings.warn( 2025-12-04T10:52:00.9192372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9192833Z frames [('total', 2), ('ok', 2)] 2025-12-04T10:52:00.9193170Z stats [('calls_captured', 66)] 2025-12-04T10:52:00.9193585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T10:52:00.9194275Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)] 2025-12-04T10:52:00.9194851Z graph_break [] 2025-12-04T10:52:00.9195136Z aten_mm_info [('aten.mm_32768_2048_2048', 3)] 2025-12-04T10:52:00.9195594Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9196665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T10:52:00.9197619Z warnings.warn( 2025-12-04T10:52:00.9198476Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml - 2025-12-04T10:52:00.9199490Z =========================== short test summary info ============================ 2025-12-04T10:52:00.9200357Z FAILED [1.1652s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range 2025-12-04T10:52:00.9201313Z 2025-12-04T10:52:00.9201547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9202453Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9203105Z 2025-12-04T10:52:00.9203368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9203960Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:00.9204476Z ================== 1 failed, 95 deselected, 2 rerun in 7.15s =================== 2025-12-04T10:52:00.9204943Z You have not run this instance of FileCheck! 2025-12-04T10:52:00.9205319Z FileCheck checks: 2025-12-04T10:52:00.9205584Z CHECK-NOT: in_out 2025-12-04T10:52:00.9205834Z Got exit code 1 2025-12-04T10:52:00.9206622Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses 2025-12-04T10:52:00.9207595Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:52:00.9208574Z W1204 10:48:19.136000 80672 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9209655Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml 2025-12-04T10:52:00.9210576Z ============================= test session starts ============================== 2025-12-04T10:52:00.9211225Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:00.9211820Z cachedir: .pytest_cache 2025-12-04T10:52:00.9212505Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:00.9213281Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:00.9213630Z configfile: pytest.ini 2025-12-04T10:52:00.9214386Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:00.9215351Z collecting ... collected 96 items / 13 deselected / 83 selected 2025-12-04T10:52:00.9215843Z stepcurrent: skipping 13 already run items. 2025-12-04T10:52:00.9216225Z Running 83 items in this shard 2025-12-04T10:52:00.9216438Z 2025-12-04T10:52:00.9216800Z inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue PASSED [2.1412s] [ 1%] 2025-12-04T10:52:00.9217703Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions PASSED [1.3498s] [ 2%] 2025-12-04T10:52:00.9218572Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes PASSED [1.2510s] [ 3%] 2025-12-04T10:52:00.9219422Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs PASSED [0.7824s] [ 4%] 2025-12-04T10:52:00.9220317Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding PASSED [0.2922s] [ 6%] 2025-12-04T10:52:00.9221328Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned ('RERUN', {'yellow': True}) [0.0520s] [ 7%] 2025-12-04T10:52:00.9222468Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned ('RERUN', {'yellow': True}) [0.0241s] [ 7%] 2025-12-04T10:52:00.9223510Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned FAILED [0.0230s] [ 7%] 2025-12-04T10:52:00.9224075Z 2025-12-04T10:52:00.9224216Z ==================================== RERUNS ==================================== 2025-12-04T10:52:00.9224768Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:00.9225299Z Traceback (most recent call last): 2025-12-04T10:52:00.9226038Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9226850Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:00.9227616Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:00.9228333Z result = fn(*args, **kwargs) 2025-12-04T10:52:00.9229010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:00.9229734Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9230406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:00.9231117Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:00.9231836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:00.9232544Z result = self._inner_convert( 2025-12-04T10:52:00.9233305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:00.9233986Z result = _compile( 2025-12-04T10:52:00.9234626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:00.9235461Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9236282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:00.9237066Z return function(*args, **kwargs) 2025-12-04T10:52:00.9237795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:00.9238565Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9239319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:00.9240071Z dynamo_output = compile_frame( 2025-12-04T10:52:00.9240786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:00.9241638Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:00.9242650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:00.9243579Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:00.9244388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:00.9245085Z tracer_output = trace_frame( 2025-12-04T10:52:00.9245734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:00.9246411Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9247087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:00.9247772Z run_tracer() 2025-12-04T10:52:00.9248389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:00.9249083Z tracer.run() 2025-12-04T10:52:00.9249675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:00.9250365Z while self.step(): 2025-12-04T10:52:00.9250997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:00.9251740Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:00.9252473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:00.9253182Z return inner_fn(self, inst) 2025-12-04T10:52:00.9253916Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:00.9254677Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:00.9255426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:00.9256320Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:00.9257228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:00.9258055Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:00.9258858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:00.9259613Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:00.9260362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:00.9261287Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:00.9262169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:00.9262949Z out = _wrap_fx_proxy( 2025-12-04T10:52:00.9263643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:00.9264547Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:00.9265454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:00.9266308Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:00.9267144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:00.9267853Z ret_val = wrap_fake_exception( 2025-12-04T10:52:00.9268565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:00.9269267Z return fn() 2025-12-04T10:52:00.9269820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:00.9270570Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:00.9271311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:00.9272183Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:00.9272927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:00.9273706Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9276180Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.') 2025-12-04T10:52:00.9278438Z 2025-12-04T10:52:00.9278538Z from user code: 2025-12-04T10:52:00.9279042Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:00.9279632Z return F.scaled_dot_product_attention( 2025-12-04T10:52:00.9279894Z 2025-12-04T10:52:00.9280598Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:00.9281435Z 2025-12-04T10:52:00.9281440Z 2025-12-04T10:52:00.9281656Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9282591Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9283246Z 2025-12-04T10:52:00.9283523Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9284134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9284601Z frames [('total', 1)] 2025-12-04T10:52:00.9284999Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9286505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9288007Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9290009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9291923Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9293410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9294964Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9296443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9297918Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9299397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9300998Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9302497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9303989Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9304611Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:00.9305128Z Traceback (most recent call last): 2025-12-04T10:52:00.9305882Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9306685Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:00.9307439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:00.9308142Z result = fn(*args, **kwargs) 2025-12-04T10:52:00.9308842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:00.9309565Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9310232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:00.9310946Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:00.9311669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:00.9312373Z result = self._inner_convert( 2025-12-04T10:52:00.9313035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:00.9313730Z result = _compile( 2025-12-04T10:52:00.9314358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:00.9315184Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9316005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:00.9316722Z return function(*args, **kwargs) 2025-12-04T10:52:00.9317446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:00.9318217Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9319103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:00.9319856Z dynamo_output = compile_frame( 2025-12-04T10:52:00.9320570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:00.9321405Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:00.9322419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:00.9323524Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:00.9324327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:00.9325024Z tracer_output = trace_frame( 2025-12-04T10:52:00.9325680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:00.9326361Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9327021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:00.9327719Z run_tracer() 2025-12-04T10:52:00.9328336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:00.9329031Z tracer.run() 2025-12-04T10:52:00.9329629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:00.9330318Z while self.step(): 2025-12-04T10:52:00.9330949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:00.9331678Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:00.9332419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:00.9333130Z return inner_fn(self, inst) 2025-12-04T10:52:00.9333859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:00.9334623Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:00.9335372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:00.9336261Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:00.9337177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:00.9338006Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:00.9338809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:00.9339567Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:00.9340300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:00.9341156Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:00.9342028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:00.9342805Z out = _wrap_fx_proxy( 2025-12-04T10:52:00.9343501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:00.9344409Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:00.9345247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:00.9346100Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:00.9347018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:00.9347723Z ret_val = wrap_fake_exception( 2025-12-04T10:52:00.9348421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:00.9349112Z return fn() 2025-12-04T10:52:00.9349681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:00.9350488Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:00.9351231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:00.9351977Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:00.9352721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:00.9353492Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9355965Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.') 2025-12-04T10:52:00.9358204Z 2025-12-04T10:52:00.9358316Z from user code: 2025-12-04T10:52:00.9358932Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:00.9359538Z return F.scaled_dot_product_attention( 2025-12-04T10:52:00.9359790Z 2025-12-04T10:52:00.9360512Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:00.9361340Z 2025-12-04T10:52:00.9361345Z 2025-12-04T10:52:00.9361570Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9362494Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9363161Z 2025-12-04T10:52:00.9363423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9364047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9364503Z frames [('total', 1)] 2025-12-04T10:52:00.9364882Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9366405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9367909Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9369835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9371739Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9373220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9374787Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9376270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9377747Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9379215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9380757Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9382252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9383741Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9384290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9384751Z frames [('total', 1)] 2025-12-04T10:52:00.9385147Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9386639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9388122Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9390055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9391961Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9393442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9394923Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9396369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9397841Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9399321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9400800Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9402485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9403984Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9404473Z =================================== FAILURES =================================== 2025-12-04T10:52:00.9405032Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:00.9405551Z Traceback (most recent call last): 2025-12-04T10:52:00.9406416Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9407234Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:00.9407993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:00.9408696Z result = fn(*args, **kwargs) 2025-12-04T10:52:00.9409394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:00.9410212Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9410870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:00.9411597Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:00.9412323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:00.9413028Z result = self._inner_convert( 2025-12-04T10:52:00.9413697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:00.9414386Z result = _compile( 2025-12-04T10:52:00.9415018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:00.9415831Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9416663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:00.9417379Z return function(*args, **kwargs) 2025-12-04T10:52:00.9418097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:00.9418856Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9419621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:00.9420372Z dynamo_output = compile_frame( 2025-12-04T10:52:00.9421076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:00.9421929Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:00.9422885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:00.9423822Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:00.9424617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:00.9425333Z tracer_output = trace_frame( 2025-12-04T10:52:00.9425982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:00.9426657Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9427322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:00.9428020Z run_tracer() 2025-12-04T10:52:00.9428635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:00.9429316Z tracer.run() 2025-12-04T10:52:00.9429922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:00.9430608Z while self.step(): 2025-12-04T10:52:00.9431246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:00.9431975Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:00.9432717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:00.9433430Z return inner_fn(self, inst) 2025-12-04T10:52:00.9434238Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:00.9435017Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:00.9435772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:00.9436671Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:00.9437567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:00.9438466Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:00.9439266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:00.9440017Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:00.9440754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:00.9441606Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:00.9442545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:00.9443307Z out = _wrap_fx_proxy( 2025-12-04T10:52:00.9444015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:00.9444920Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:00.9445766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:00.9446613Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:00.9447469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:00.9448172Z ret_val = wrap_fake_exception( 2025-12-04T10:52:00.9448875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:00.9449565Z return fn() 2025-12-04T10:52:00.9450131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:00.9450881Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:00.9451610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:00.9452363Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:00.9453103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:00.9453864Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9456332Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.') 2025-12-04T10:52:00.9458583Z 2025-12-04T10:52:00.9458682Z from user code: 2025-12-04T10:52:00.9459197Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:00.9459921Z return F.scaled_dot_product_attention( 2025-12-04T10:52:00.9460173Z 2025-12-04T10:52:00.9460890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:00.9461713Z 2025-12-04T10:52:00.9461808Z 2025-12-04T10:52:00.9462025Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9462909Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9463580Z 2025-12-04T10:52:00.9463845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9464471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9464979Z frames [('total', 1)] 2025-12-04T10:52:00.9465373Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9466880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9468402Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9470334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9472236Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9473724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9475204Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9476732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9478207Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9479675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9481157Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9482714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9484212Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9484765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9485224Z frames [('total', 1)] 2025-12-04T10:52:00.9485617Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9487108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9488608Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9490538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9492516Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9494000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9495470Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9496943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9498475Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9499956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9501596Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9503079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9504564Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9505126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9505583Z frames [('total', 1)] 2025-12-04T10:52:00.9506460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml - 2025-12-04T10:52:00.9507475Z =========================== short test summary info ============================ 2025-12-04T10:52:00.9510379Z FAILED [0.0230s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.') 2025-12-04T10:52:00.9513095Z 2025-12-04T10:52:00.9513194Z from user code: 2025-12-04T10:52:00.9513688Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:00.9514279Z return F.scaled_dot_product_attention( 2025-12-04T10:52:00.9514546Z 2025-12-04T10:52:00.9515252Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:00.9516089Z 2025-12-04T10:52:00.9516093Z 2025-12-04T10:52:00.9516305Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9517173Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9517829Z 2025-12-04T10:52:00.9518090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9518676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:00.9519203Z ============= 1 failed, 5 passed, 13 deselected, 2 rerun in 5.96s ============== 2025-12-04T10:52:00.9519653Z Got exit code 1 2025-12-04T10:52:00.9519904Z Retrying single test... 2025-12-04T10:52:00.9520657Z W1204 10:48:40.163000 80955 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:00.9521762Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml 2025-12-04T10:52:00.9522647Z ============================= test session starts ============================== 2025-12-04T10:52:00.9523306Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:00.9523902Z cachedir: .pytest_cache 2025-12-04T10:52:00.9524605Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:00.9525453Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:00.9525805Z configfile: pytest.ini 2025-12-04T10:52:00.9526526Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:00.9527406Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:00.9528350Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9529199Z Running 1 items in this shard 2025-12-04T10:52:00.9529404Z 2025-12-04T10:52:00.9530333Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:42.639934199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:00.9531366Z 2025-12-04T10:52:00.9531514Z ('RERUN', {'yellow': True}) [15.2599s] [100%] 2025-12-04T10:52:00.9532665Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:57.838425527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:00.9533709Z 2025-12-04T10:52:00.9533838Z ('RERUN', {'yellow': True}) [0.0265s] [100%] 2025-12-04T10:52:00.9534997Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:57.863701046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:00.9536025Z 2025-12-04T10:52:00.9536136Z FAILED [0.0229s] [100%] 2025-12-04T10:52:00.9536307Z 2025-12-04T10:52:00.9536448Z ==================================== RERUNS ==================================== 2025-12-04T10:52:00.9537005Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:00.9537535Z Traceback (most recent call last): 2025-12-04T10:52:00.9538282Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9539078Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:00.9539837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:00.9540550Z result = fn(*args, **kwargs) 2025-12-04T10:52:00.9541234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:00.9541949Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9542612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:00.9543338Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:00.9544038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:00.9544744Z result = self._inner_convert( 2025-12-04T10:52:00.9545417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:00.9546100Z result = _compile( 2025-12-04T10:52:00.9546720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:00.9547622Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9548454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:00.9549153Z return function(*args, **kwargs) 2025-12-04T10:52:00.9549872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:00.9550649Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9551420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:00.9552213Z dynamo_output = compile_frame( 2025-12-04T10:52:00.9552929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:00.9553784Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:00.9554727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:00.9555668Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:00.9556464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:00.9557176Z tracer_output = trace_frame( 2025-12-04T10:52:00.9557812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:00.9558502Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9559173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:00.9559874Z run_tracer() 2025-12-04T10:52:00.9560478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:00.9561170Z tracer.run() 2025-12-04T10:52:00.9561783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:00.9562527Z while self.step(): 2025-12-04T10:52:00.9563168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:00.9563914Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:00.9564661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:00.9565370Z return inner_fn(self, inst) 2025-12-04T10:52:00.9566104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:00.9566885Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:00.9567626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:00.9568522Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:00.9569439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:00.9570275Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:00.9571067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:00.9571819Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:00.9572573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:00.9573439Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:00.9574299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:00.9575073Z out = _wrap_fx_proxy( 2025-12-04T10:52:00.9575857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:00.9576753Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:00.9577589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:00.9578445Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:00.9579298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:00.9580047Z ret_val = wrap_fake_exception( 2025-12-04T10:52:00.9580752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:00.9581455Z return fn() 2025-12-04T10:52:00.9582010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:00.9582764Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:00.9583502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:00.9584258Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:00.9584990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:00.9585758Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9684946Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:00.9783141Z 2025-12-04T10:52:00.9783243Z from user code: 2025-12-04T10:52:00.9783738Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:00.9784336Z return F.scaled_dot_product_attention( 2025-12-04T10:52:00.9784601Z 2025-12-04T10:52:00.9785302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:00.9786152Z 2025-12-04T10:52:00.9786156Z 2025-12-04T10:52:00.9786375Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:00.9787373Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9788037Z 2025-12-04T10:52:00.9788321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:00.9788930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:00.9789389Z frames [('total', 1)] 2025-12-04T10:52:00.9789776Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:00.9791278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:00.9792799Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9794863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:00.9796789Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9798274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:00.9799744Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9801372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:00.9803019Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9804504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:00.9805966Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9807454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:00.9808951Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9810408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:00.9811864Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9812468Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:00.9813000Z Traceback (most recent call last): 2025-12-04T10:52:00.9813747Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:00.9814539Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:00.9815298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:00.9816012Z result = fn(*args, **kwargs) 2025-12-04T10:52:00.9816705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:00.9817412Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9818076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:00.9818811Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:00.9819533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:00.9820230Z result = self._inner_convert( 2025-12-04T10:52:00.9820912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:00.9821601Z result = _compile( 2025-12-04T10:52:00.9822222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:00.9823055Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9823894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:00.9824604Z return function(*args, **kwargs) 2025-12-04T10:52:00.9825308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:00.9826190Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:00.9826966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:00.9827699Z dynamo_output = compile_frame( 2025-12-04T10:52:00.9828416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:00.9829263Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:00.9830228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:00.9831302Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:00.9832116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:00.9832824Z tracer_output = trace_frame( 2025-12-04T10:52:00.9833482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:00.9834161Z return fn(*args, **kwargs) 2025-12-04T10:52:00.9834839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:00.9835526Z run_tracer() 2025-12-04T10:52:00.9836146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:00.9836846Z tracer.run() 2025-12-04T10:52:00.9837450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:00.9838146Z while self.step(): 2025-12-04T10:52:00.9838791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:00.9839538Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:00.9840275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:00.9840984Z return inner_fn(self, inst) 2025-12-04T10:52:00.9841717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:00.9842565Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:00.9843304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:00.9844201Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:00.9845127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:00.9845948Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:00.9846760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:00.9847513Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:00.9848258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:00.9849100Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:00.9849979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:00.9850755Z out = _wrap_fx_proxy( 2025-12-04T10:52:00.9851467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:00.9852361Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:00.9853204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:00.9854061Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:00.9855005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:00.9855710Z ret_val = wrap_fake_exception( 2025-12-04T10:52:00.9856420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:00.9857131Z return fn() 2025-12-04T10:52:00.9857690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:00.9858505Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:00.9859248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:00.9859999Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:00.9860727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:00.9861496Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:00.9960528Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0058335Z 2025-12-04T10:52:01.0058438Z from user code: 2025-12-04T10:52:01.0058935Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0059543Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0059796Z 2025-12-04T10:52:01.0060508Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0061346Z 2025-12-04T10:52:01.0061350Z 2025-12-04T10:52:01.0061568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0062439Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0063092Z 2025-12-04T10:52:01.0063368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0063982Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0064441Z frames [('total', 1)] 2025-12-04T10:52:01.0064831Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0066348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0067838Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0069777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0071695Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0073184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0074669Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0076248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0077737Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0079221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0080782Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0082315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0083811Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0085262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:01.0086712Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0087257Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0087722Z frames [('total', 1)] 2025-12-04T10:52:01.0088113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0089616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0091098Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0093020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0094945Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0096430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0097906Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0099362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0100969Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0102456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0103942Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0105422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0106915Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0107508Z =================================== FAILURES =================================== 2025-12-04T10:52:01.0108070Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:01.0108587Z Traceback (most recent call last): 2025-12-04T10:52:01.0109335Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0110142Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:01.0110980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:01.0111684Z result = fn(*args, **kwargs) 2025-12-04T10:52:01.0112379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:01.0113100Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0113757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:01.0114490Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:01.0115206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:01.0115913Z result = self._inner_convert( 2025-12-04T10:52:01.0116578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:01.0117265Z result = _compile( 2025-12-04T10:52:01.0117902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:01.0118717Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0119552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:01.0120264Z return function(*args, **kwargs) 2025-12-04T10:52:01.0120986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:01.0121750Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0122582Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:01.0123335Z dynamo_output = compile_frame( 2025-12-04T10:52:01.0124056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:01.0124897Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:01.0125851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:01.0126791Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:01.0127584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:01.0128296Z tracer_output = trace_frame( 2025-12-04T10:52:01.0128944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:01.0129621Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0130279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:01.0130976Z run_tracer() 2025-12-04T10:52:01.0131593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:01.0132293Z tracer.run() 2025-12-04T10:52:01.0132888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:01.0133572Z while self.step(): 2025-12-04T10:52:01.0134203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:01.0135016Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:01.0135764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:01.0136474Z return inner_fn(self, inst) 2025-12-04T10:52:01.0137210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:01.0137971Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:01.0138784Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:01.0139675Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:01.0140572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:01.0141410Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:01.0142217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:01.0142970Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:01.0143699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:01.0144562Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:01.0145439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:01.0146218Z out = _wrap_fx_proxy( 2025-12-04T10:52:01.0146911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:01.0147818Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:01.0148664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:01.0149506Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:01.0150364Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:01.0151062Z ret_val = wrap_fake_exception( 2025-12-04T10:52:01.0151766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:01.0152462Z return fn() 2025-12-04T10:52:01.0153031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:01.0153779Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:01.0154509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:01.0155260Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:01.0156008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:01.0156773Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0255887Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0354089Z 2025-12-04T10:52:01.0354194Z from user code: 2025-12-04T10:52:01.0354697Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0355404Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0355675Z 2025-12-04T10:52:01.0356376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0357203Z 2025-12-04T10:52:01.0357222Z 2025-12-04T10:52:01.0357437Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0358316Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0359056Z 2025-12-04T10:52:01.0359318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0359944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0360510Z frames [('total', 1)] 2025-12-04T10:52:01.0360905Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0362464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0363968Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0366007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0367924Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0369399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0370881Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0372354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0373824Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0375307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0376769Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0378263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0379754Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0381201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:01.0382641Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0383213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0383674Z frames [('total', 1)] 2025-12-04T10:52:01.0384058Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0385716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0387208Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0388789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0389078Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0390208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0390433Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0391561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0391767Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0392917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0393129Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0394291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0394498Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0394728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0394834Z frames [('total', 1)] 2025-12-04T10:52:01.0395541Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml - 2025-12-04T10:52:01.0395734Z =========================== short test summary info ============================ 2025-12-04T10:52:01.0495547Z FAILED [0.0229s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0496033Z 2025-12-04T10:52:01.0496138Z from user code: 2025-12-04T10:52:01.0496489Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0496623Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0496633Z 2025-12-04T10:52:01.0497349Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0497354Z 2025-12-04T10:52:01.0497359Z 2025-12-04T10:52:01.0497575Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0498157Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0498178Z 2025-12-04T10:52:01.0498444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0498626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:01.0498839Z ================== 1 failed, 95 deselected, 2 rerun in 15.35s ================== 2025-12-04T10:52:01.0498937Z Got exit code 1 2025-12-04T10:52:01.0499042Z Retrying single test... 2025-12-04T10:52:01.0499552Z W1204 10:49:07.528000 81075 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.0500079Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml 2025-12-04T10:52:01.0500256Z ============================= test session starts ============================== 2025-12-04T10:52:01.0500607Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:01.0500714Z cachedir: .pytest_cache 2025-12-04T10:52:01.0501501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:01.0501627Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:01.0501736Z configfile: pytest.ini 2025-12-04T10:52:01.0502288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:01.0502507Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:01.0503150Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0503264Z Running 1 items in this shard 2025-12-04T10:52:01.0503269Z 2025-12-04T10:52:01.0504189Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:09.019090885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.0504209Z 2025-12-04T10:52:01.0504338Z ('RERUN', {'yellow': True}) [14.9005s] [100%] 2025-12-04T10:52:01.0505243Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:24.857973452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.0505249Z 2025-12-04T10:52:01.0505392Z ('RERUN', {'yellow': True}) [0.0258s] [100%] 2025-12-04T10:52:01.0506292Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:24.882090558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.0506296Z 2025-12-04T10:52:01.0506408Z FAILED [0.0219s] [100%] 2025-12-04T10:52:01.0506412Z 2025-12-04T10:52:01.0506555Z ==================================== RERUNS ==================================== 2025-12-04T10:52:01.0506824Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:01.0506957Z Traceback (most recent call last): 2025-12-04T10:52:01.0507472Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0507644Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:01.0508113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:01.0508228Z result = fn(*args, **kwargs) 2025-12-04T10:52:01.0508716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:01.0508825Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0509277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:01.0509534Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:01.0509986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:01.0510115Z result = self._inner_convert( 2025-12-04T10:52:01.0510563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:01.0510663Z result = _compile( 2025-12-04T10:52:01.0511126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:01.0511456Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0511910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:01.0512036Z return function(*args, **kwargs) 2025-12-04T10:52:01.0512519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:01.0512684Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0513166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:01.0513281Z dynamo_output = compile_frame( 2025-12-04T10:52:01.0513774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:01.0514005Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:01.0514607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:01.0514815Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:01.0515271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:01.0515404Z tracer_output = trace_frame( 2025-12-04T10:52:01.0515827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:01.0515936Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0516407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:01.0516501Z run_tracer() 2025-12-04T10:52:01.0516969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:01.0517071Z tracer.run() 2025-12-04T10:52:01.0517512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:01.0517627Z while self.step(): 2025-12-04T10:52:01.0518074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:01.0518225Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:01.0518693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:01.0518807Z return inner_fn(self, inst) 2025-12-04T10:52:01.0519330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:01.0519452Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:01.0519950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:01.0520222Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:01.0520735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:01.0520926Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:01.0521475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:01.0521594Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:01.0522175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:01.0522391Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:01.0522918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:01.0523101Z out = _wrap_fx_proxy( 2025-12-04T10:52:01.0523608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:01.0523876Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:01.0524316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:01.0524589Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:01.0525042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:01.0525159Z ret_val = wrap_fake_exception( 2025-12-04T10:52:01.0525645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:01.0525737Z return fn() 2025-12-04T10:52:01.0526147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:01.0526358Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:01.0526768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:01.0526965Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:01.0527393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:01.0527602Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0626263Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0626730Z 2025-12-04T10:52:01.0626833Z from user code: 2025-12-04T10:52:01.0627172Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0627315Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0627320Z 2025-12-04T10:52:01.0628024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0628030Z 2025-12-04T10:52:01.0628035Z 2025-12-04T10:52:01.0628259Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0628784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0628793Z 2025-12-04T10:52:01.0629062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0629293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0629396Z frames [('total', 1)] 2025-12-04T10:52:01.0629621Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0630843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0631057Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0632656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0632928Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0634078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0634285Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0635418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0635624Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0636753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0636973Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0638117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0638336Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0639432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:01.0639634Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0639923Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:01.0640044Z Traceback (most recent call last): 2025-12-04T10:52:01.0640566Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0640724Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:01.0641197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:01.0641320Z result = fn(*args, **kwargs) 2025-12-04T10:52:01.0641792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:01.0641900Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0642424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:01.0654068Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:01.0654645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:01.0654766Z result = self._inner_convert( 2025-12-04T10:52:01.0655240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:01.0655346Z result = _compile( 2025-12-04T10:52:01.0655975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:01.0656217Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0656675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:01.0656810Z return function(*args, **kwargs) 2025-12-04T10:52:01.0657290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:01.0657515Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0658014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:01.0658132Z dynamo_output = compile_frame( 2025-12-04T10:52:01.0658626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:01.0658859Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:01.0659451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:01.0659676Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:01.0660137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:01.0660265Z tracer_output = trace_frame( 2025-12-04T10:52:01.0660697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:01.0660811Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0661285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:01.0661381Z run_tracer() 2025-12-04T10:52:01.0661840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:01.0661952Z tracer.run() 2025-12-04T10:52:01.0662397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:01.0662514Z while self.step(): 2025-12-04T10:52:01.0662959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:01.0663111Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:01.0663585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:01.0663698Z return inner_fn(self, inst) 2025-12-04T10:52:01.0664211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:01.0664345Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:01.0664843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:01.0665097Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:01.0665625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:01.0665801Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:01.0666305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:01.0666427Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:01.0666928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:01.0667152Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:01.0667823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:01.0667948Z out = _wrap_fx_proxy( 2025-12-04T10:52:01.0668454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:01.0668711Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:01.0669165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:01.0669512Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:01.0669966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:01.0670083Z ret_val = wrap_fake_exception( 2025-12-04T10:52:01.0670553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:01.0670664Z return fn() 2025-12-04T10:52:01.0671075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:01.0671269Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:01.0671695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:01.0671891Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:01.0672314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:01.0672527Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0770920Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0771382Z 2025-12-04T10:52:01.0771500Z from user code: 2025-12-04T10:52:01.0771836Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0771968Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0771979Z 2025-12-04T10:52:01.0772699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0772705Z 2025-12-04T10:52:01.0772709Z 2025-12-04T10:52:01.0772922Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0773466Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0773472Z 2025-12-04T10:52:01.0773736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0773967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0774071Z frames [('total', 1)] 2025-12-04T10:52:01.0774284Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0775458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0775677Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0777340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0777554Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0778688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0778966Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0780093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0780311Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0781439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0781656Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0782794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0783002Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0784114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:01.0784326Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0784553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0784657Z frames [('total', 1)] 2025-12-04T10:52:01.0784871Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0786025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0786234Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0787964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0788172Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0789300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0789519Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0790649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0790866Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0792060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0792276Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0793418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0793675Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0793831Z =================================== FAILURES =================================== 2025-12-04T10:52:01.0794099Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________ 2025-12-04T10:52:01.0794220Z Traceback (most recent call last): 2025-12-04T10:52:01.0794749Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0794906Z out, code = run_and_get_code(f_compiled, *inputs) 2025-12-04T10:52:01.0795389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code 2025-12-04T10:52:01.0795501Z result = fn(*args, **kwargs) 2025-12-04T10:52:01.0795975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T10:52:01.0796100Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0796559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__ 2025-12-04T10:52:01.0796708Z result = self._torchdynamo_orig_backend( 2025-12-04T10:52:01.0797159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__ 2025-12-04T10:52:01.0797273Z result = self._inner_convert( 2025-12-04T10:52:01.0797735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__ 2025-12-04T10:52:01.0797835Z result = _compile( 2025-12-04T10:52:01.0798284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile 2025-12-04T10:52:01.0798531Z guarded_code, tracer_output = compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0798982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function 2025-12-04T10:52:01.0799115Z return function(*args, **kwargs) 2025-12-04T10:52:01.0799595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner 2025-12-04T10:52:01.0799742Z return _compile_inner(code, one_graph, hooks) 2025-12-04T10:52:01.0800237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner 2025-12-04T10:52:01.0800356Z dynamo_output = compile_frame( 2025-12-04T10:52:01.0800998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame 2025-12-04T10:52:01.0801242Z bytecode, tracer_output = transform_code_object(code, transform) 2025-12-04T10:52:01.0801828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object 2025-12-04T10:52:01.0802129Z tracer_output = transformations(instructions, code_options) 2025-12-04T10:52:01.0802604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform 2025-12-04T10:52:01.0802719Z tracer_output = trace_frame( 2025-12-04T10:52:01.0803159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn 2025-12-04T10:52:01.0803267Z return fn(*args, **kwargs) 2025-12-04T10:52:01.0803869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame 2025-12-04T10:52:01.0803966Z run_tracer() 2025-12-04T10:52:01.0804424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer 2025-12-04T10:52:01.0804535Z tracer.run() 2025-12-04T10:52:01.0804979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run 2025-12-04T10:52:01.0805082Z while self.step(): 2025-12-04T10:52:01.0805549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step 2025-12-04T10:52:01.0805779Z self.dispatch_table[inst.opcode](self, inst) 2025-12-04T10:52:01.0806246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper 2025-12-04T10:52:01.0806358Z return inner_fn(self, inst) 2025-12-04T10:52:01.0806874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW 2025-12-04T10:52:01.0807011Z self.call_function(fn, args, kwargs) 2025-12-04T10:52:01.0807505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function 2025-12-04T10:52:01.0807758Z self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] 2025-12-04T10:52:01.0808289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward 2025-12-04T10:52:01.0808470Z return getattr(self.realize(), name)(*args, **kwargs) 2025-12-04T10:52:01.0808972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function 2025-12-04T10:52:01.0809091Z tensor_variable = wrap_fx_proxy( 2025-12-04T10:52:01.0809591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy 2025-12-04T10:52:01.0809822Z return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs) 2025-12-04T10:52:01.0810350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls 2025-12-04T10:52:01.0810467Z out = _wrap_fx_proxy( 2025-12-04T10:52:01.0810972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy 2025-12-04T10:52:01.0811225Z example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) 2025-12-04T10:52:01.0811685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value 2025-12-04T10:52:01.0811954Z raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None 2025-12-04T10:52:01.0812390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value 2025-12-04T10:52:01.0812517Z ret_val = wrap_fake_exception( 2025-12-04T10:52:01.0812990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception 2025-12-04T10:52:01.0813095Z return fn() 2025-12-04T10:52:01.0813506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in 2025-12-04T10:52:01.0813703Z lambda: run_node(tx.output, node, args, kwargs, nnmodule) 2025-12-04T10:52:01.0814124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node 2025-12-04T10:52:01.0814325Z raise RuntimeError(make_error_message(e)).with_traceback( 2025-12-04T10:52:01.0814745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node 2025-12-04T10:52:01.0814955Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0913854Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.0914311Z 2025-12-04T10:52:01.0914429Z from user code: 2025-12-04T10:52:01.0914764Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.0914955Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.0914960Z 2025-12-04T10:52:01.0915674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.0915680Z 2025-12-04T10:52:01.0915686Z 2025-12-04T10:52:01.0915902Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.0916445Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.0916450Z 2025-12-04T10:52:01.0916711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.0916946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0917049Z frames [('total', 1)] 2025-12-04T10:52:01.0917264Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0918439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0918651Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0920252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0920461Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0921599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0921825Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0923016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0923234Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0924363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0924566Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0925728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0925930Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0927114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T10:52:01.0927322Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0927550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0927652Z frames [('total', 1)] 2025-12-04T10:52:01.0927862Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:52:01.0929157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.) 2025-12-04T10:52:01.0929362Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0930958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.) 2025-12-04T10:52:01.0931166Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0932295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.) 2025-12-04T10:52:01.0932516Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0933641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.) 2025-12-04T10:52:01.0933868Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0934993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.) 2025-12-04T10:52:01.0935210Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0936355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.) 2025-12-04T10:52:01.0936563Z return node.target(*args, **kwargs) # type: ignore[operator] 2025-12-04T10:52:01.0936791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:52:01.0936891Z frames [('total', 1)] 2025-12-04T10:52:01.0937603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml - 2025-12-04T10:52:01.0937787Z =========================== short test summary info ============================ 2025-12-04T10:52:01.1037657Z FAILED [0.0219s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function (*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor const&, double, bool, std::optional, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist const&, double, bool, std::optional, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, double, bool, std::optional, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 from ??:0\n') 2025-12-04T10:52:01.1038137Z 2025-12-04T10:52:01.1038267Z from user code: 2025-12-04T10:52:01.1038664Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f 2025-12-04T10:52:01.1038800Z return F.scaled_dot_product_attention( 2025-12-04T10:52:01.1038805Z 2025-12-04T10:52:01.1039523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo" 2025-12-04T10:52:01.1039529Z 2025-12-04T10:52:01.1039534Z 2025-12-04T10:52:01.1039744Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1040343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.1040348Z 2025-12-04T10:52:01.1040609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1040787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:01.1040999Z ================== 1 failed, 95 deselected, 2 rerun in 14.98s ================== 2025-12-04T10:52:01.1041096Z Got exit code 1 2025-12-04T10:52:01.1041559Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned 2025-12-04T10:52:01.1041962Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:52:01.1042462Z W1204 10:49:34.543000 81195 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1042998Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml 2025-12-04T10:52:01.1043164Z ============================= test session starts ============================== 2025-12-04T10:52:01.1043522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:01.1043632Z cachedir: .pytest_cache 2025-12-04T10:52:01.1044148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:01.1044285Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:01.1044391Z configfile: pytest.ini 2025-12-04T10:52:01.1044926Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:01.1045155Z collecting ... collected 96 items / 19 deselected / 77 selected 2025-12-04T10:52:01.1045293Z stepcurrent: skipping 19 already run items. 2025-12-04T10:52:01.1045427Z Running 77 items in this shard 2025-12-04T10:52:01.1045432Z 2025-12-04T10:52:01.1045791Z inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean PASSED [3.8390s] [ 1%] 2025-12-04T10:52:01.1046154Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision PASSED [0.5811s] [ 2%] 2025-12-04T10:52:01.1046621Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain PASSED [1.0348s] [ 3%] 2025-12-04T10:52:01.1047059Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain PASSED [0.7899s] [ 5%] 2025-12-04T10:52:01.1047505Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding PASSED [0.1112s] [ 6%] 2025-12-04T10:52:01.1048319Z inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view W1204 10:49:43.102000 81195 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1048420Z PASSED [3.5055s] [ 7%] 2025-12-04T10:52:01.1048829Z inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs PASSED [0.5344s] [ 9%] 2025-12-04T10:52:01.1049289Z inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts PASSED [0.4844s] [ 10%] 2025-12-04T10:52:01.1049828Z inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic SKIPPED [0.0003s] (flash attention not supported) [ 11%] 2025-12-04T10:52:01.1050238Z inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants PASSED [0.6928s] [ 12%] 2025-12-04T10:52:01.1050829Z inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu SKIPPED [0.0032s] (uses bfloat16 atomic add instrs which requires SM >= 90) [ 14%] 2025-12-04T10:52:01.1051148Z inductor/test_cuda_repro.py::CudaReproTests::test_full_copy PASSED [0.1728s] [ 15%] 2025-12-04T10:52:01.1051465Z inductor/test_cuda_repro.py::CudaReproTests::test_identity_load PASSED [0.6059s] [ 16%] 2025-12-04T10:52:01.1052069Z inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback SKIPPED [0.0031s] (uses bfloat16 atomic add instrs which requires SM >= 90) [ 18%] 2025-12-04T10:52:01.1052491Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph PASSED [0.9950s] [ 19%] 2025-12-04T10:52:01.1052878Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph PASSED [0.4816s] [ 20%] 2025-12-04T10:52:01.1053224Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue PASSED [0.5307s] [ 22%] 2025-12-04T10:52:01.1053623Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph PASSED [0.5661s] [ 23%] 2025-12-04T10:52:01.1054011Z inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask PASSED [0.5823s] [ 24%] 2025-12-04T10:52:01.1054453Z inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate PASSED [0.0051s] [ 25%] 2025-12-04T10:52:01.1054834Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune PASSED [0.5240s] [ 27%] 2025-12-04T10:52:01.1055218Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune PASSED [0.5783s] [ 28%] 2025-12-04T10:52:01.1055598Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs PASSED [0.3349s] [ 29%] 2025-12-04T10:52:01.1055947Z inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last PASSED [0.6611s] [ 31%] 2025-12-04T10:52:01.1056519Z inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 32%] 2025-12-04T10:52:01.1056828Z inductor/test_cuda_repro.py::CudaReproTests::test_issue100806 PASSED [0.6958s] [ 33%] 2025-12-04T10:52:01.1057147Z inductor/test_cuda_repro.py::CudaReproTests::test_issue103461 PASSED [0.5002s] [ 35%] 2025-12-04T10:52:01.1057450Z inductor/test_cuda_repro.py::CudaReproTests::test_issue103481 PASSED [0.2437s] [ 36%] 2025-12-04T10:52:01.1057753Z inductor/test_cuda_repro.py::CudaReproTests::test_issue104759 PASSED [0.6785s] [ 37%] 2025-12-04T10:52:01.1058109Z inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input PASSED [0.2135s] [ 38%] 2025-12-04T10:52:01.1058443Z inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input PASSED [0.1810s] [ 40%] 2025-12-04T10:52:01.1058760Z inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924 PASSED [0.3615s] [ 41%] 2025-12-04T10:52:01.1059100Z inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing PASSED [0.6199s] [ 42%] 2025-12-04T10:52:01.1059440Z inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input PASSED [0.3390s] [ 44%] 2025-12-04T10:52:01.1059856Z inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size PASSED [0.1767s] [ 45%] 2025-12-04T10:52:01.1060206Z inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward PASSED [0.7036s] [ 46%] 2025-12-04T10:52:01.1060553Z inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd PASSED [3.8644s] [ 48%] 2025-12-04T10:52:01.1060934Z inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor PASSED [0.3135s] [ 49%] 2025-12-04T10:52:01.1061289Z inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile PASSED [0.1566s] [ 50%] 2025-12-04T10:52:01.1061689Z inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback PASSED [0.2141s] [ 51%] 2025-12-04T10:52:01.1062049Z inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor PASSED [0.1715s] [ 53%] 2025-12-04T10:52:01.1062903Z inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes W1204 10:50:03.479000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1063309Z W1204 10:50:05.478000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1063695Z W1204 10:50:12.909000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1064088Z W1204 10:50:12.919000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1064529Z W1204 10:50:12.929000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1064913Z W1204 10:50:12.940000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1065308Z W1204 10:50:12.950000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1065694Z W1204 10:50:12.961000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs 2025-12-04T10:52:01.1065812Z PASSED [9.5289s] [ 54%] 2025-12-04T10:52:01.1066210Z inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs PASSED [0.2367s] [ 55%] 2025-12-04T10:52:01.1066577Z inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op PASSED [2.2204s] [ 57%] 2025-12-04T10:52:01.1067032Z inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices PASSED [0.0035s] [ 58%] 2025-12-04T10:52:01.1067399Z inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one PASSED [0.2424s] [ 59%] 2025-12-04T10:52:01.1067968Z inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device SKIPPED [0.0003s] (requires multiple cuda devices) [ 61%] 2025-12-04T10:52:01.1068292Z inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion PASSED [0.5552s] [ 62%] 2025-12-04T10:52:01.1068869Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile ('RERUN', {'yellow': True}) [0.0303s] [ 63%] 2025-12-04T10:52:01.1069453Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile ('RERUN', {'yellow': True}) [0.0045s] [ 63%] 2025-12-04T10:52:01.1069932Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile FAILED [0.0050s] [ 63%] 2025-12-04T10:52:01.1069938Z 2025-12-04T10:52:01.1070097Z ==================================== RERUNS ==================================== 2025-12-04T10:52:01.1070391Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1070510Z Traceback (most recent call last): 2025-12-04T10:52:01.1071111Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1071237Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1071601Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1071843Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1072213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1072340Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1072510Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1072515Z 2025-12-04T10:52:01.1072732Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1073347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1073352Z 2025-12-04T10:52:01.1073614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1073916Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1074101Z Traceback (most recent call last): 2025-12-04T10:52:01.1074683Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1074815Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1075174Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1075405Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1075838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1075951Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1076128Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1076134Z 2025-12-04T10:52:01.1076345Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1076950Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1076957Z 2025-12-04T10:52:01.1077226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1077371Z =================================== FAILURES =================================== 2025-12-04T10:52:01.1077672Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1077790Z Traceback (most recent call last): 2025-12-04T10:52:01.1078365Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1078506Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1078862Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1079092Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1079472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1079586Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1079763Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1079768Z 2025-12-04T10:52:01.1079981Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1080578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1080597Z 2025-12-04T10:52:01.1080859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1081564Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml - 2025-12-04T10:52:01.1081746Z =========================== short test summary info ============================ 2025-12-04T10:52:01.1082514Z FAILED [0.0050s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1082521Z 2025-12-04T10:52:01.1082747Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1083345Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1083350Z 2025-12-04T10:52:01.1083608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1083800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:01.1084035Z ======= 1 failed, 43 passed, 5 skipped, 19 deselected, 2 rerun in 39.99s ======= 2025-12-04T10:52:01.1084131Z Got exit code 1 2025-12-04T10:52:01.1084250Z Retrying single test... 2025-12-04T10:52:01.1084689Z W1204 10:50:27.877000 82859 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1085296Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml 2025-12-04T10:52:01.1085459Z ============================= test session starts ============================== 2025-12-04T10:52:01.1085802Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:01.1085921Z cachedir: .pytest_cache 2025-12-04T10:52:01.1086429Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:01.1086609Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:01.1086731Z configfile: pytest.ini 2025-12-04T10:52:01.1087263Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:01.1087491Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:01.1088175Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1088289Z Running 1 items in this shard 2025-12-04T10:52:01.1088294Z 2025-12-04T10:52:01.1089250Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:29.258688608 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T10:52:01.1089763Z [W1204 10:50:29.258704409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1089772Z 2025-12-04T10:52:01.1089914Z ('RERUN', {'yellow': True}) [15.5910s] [100%] 2025-12-04T10:52:01.1090892Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:45.858803773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1090901Z 2025-12-04T10:52:01.1091044Z ('RERUN', {'yellow': True}) [0.0067s] [100%] 2025-12-04T10:52:01.1092015Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:45.864580178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1092020Z 2025-12-04T10:52:01.1092118Z FAILED [0.0039s] [100%] 2025-12-04T10:52:01.1092141Z 2025-12-04T10:52:01.1092280Z ==================================== RERUNS ==================================== 2025-12-04T10:52:01.1092574Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1092710Z Traceback (most recent call last): 2025-12-04T10:52:01.1093290Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1093413Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1093790Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1094018Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1094395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1094510Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1094677Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1095476Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1095591Z C++ CapturedTraceback: 2025-12-04T10:52:01.1096943Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1097423Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1097747Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1099078Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1101062Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1108213Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1109628Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1111229Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1112109Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1116231Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1117197Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1118330Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1121788Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1122475Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1123219Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1124043Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1128938Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1129211Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1129600Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1129865Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1130131Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1130498Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1130846Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1131157Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1131444Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1131741Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1132151Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1132520Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1132789Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1133151Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1133404Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1133782Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1134181Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1134558Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1134953Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1135317Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1135726Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1136089Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1136483Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1136860Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1137147Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1137409Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1137769Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1138110Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1138422Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1138708Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1139016Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1139467Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1139830Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1140236Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1140598Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1140864Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1141282Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1141676Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1142051Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1142451Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1142815Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1143168Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1143466Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1143764Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1144029Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1144281Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1144655Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1145053Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1145429Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1145823Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1146181Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1146449Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1146809Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1147219Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1147581Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1147971Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1148346Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1148602Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1148964Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1149369Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1149727Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1150137Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1150499Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1150837Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1151212Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1151502Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1151809Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1152204Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1152563Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1152902Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1153263Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1153657Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1154032Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1154430Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1154804Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1155149Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1155444Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1155741Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1156040Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1156448Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1156819Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1157233Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1157618Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1158023Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1158406Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1158665Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1159040Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1159458Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1159826Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1160234Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1160615Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1160963Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1161277Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1161567Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1161870Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1162363Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1162734Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1163228Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1163599Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1164002Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1164385Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1164789Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1165308Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1165714Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1166082Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1166390Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1166692Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1166959Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1167254Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1167601Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1167943Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1168229Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1168496Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1168776Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1168973Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1169091Z #135 _start from ??:0 2025-12-04T10:52:01.1169210Z #136 from ??:0 2025-12-04T10:52:01.1169217Z 2025-12-04T10:52:01.1169222Z 2025-12-04T10:52:01.1169438Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1170060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1170066Z 2025-12-04T10:52:01.1170334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1170645Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1170766Z Traceback (most recent call last): 2025-12-04T10:52:01.1171351Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1171489Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1171853Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1172085Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1172466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1172580Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1172766Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1173558Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1173669Z C++ CapturedTraceback: 2025-12-04T10:52:01.1175022Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1175502Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1175846Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1177166Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1178930Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1186021Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1187418Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1189026Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1189840Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1193974Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1194923Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1196037Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1199494Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1200117Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1201019Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1201850Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1206873Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1207163Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1207558Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1207821Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1208086Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1208456Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1208816Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1209113Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1209402Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1209709Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1210109Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1210477Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1210743Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1211104Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1211373Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1211735Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1212130Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1212504Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1212901Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1213279Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1213675Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1214038Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1214449Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1214811Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1215111Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1215363Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1215725Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1216076Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1216380Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1216665Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1216974Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1217436Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1217812Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1218206Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1218567Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1218829Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1219249Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1219657Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1220017Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1220416Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1220789Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1221136Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1221445Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1221730Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1221998Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1222261Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1222622Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1223017Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1223393Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1223786Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1224159Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1224410Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1224772Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1225185Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1225542Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1225949Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1226315Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1226566Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1226940Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1227335Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1227694Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1228104Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1228463Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1228814Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1229183Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1229472Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1229778Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1230176Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1230549Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1230858Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1231218Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1231625Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1231990Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1232397Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1232757Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1233095Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1233404Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1233695Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1233988Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1234397Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1234769Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1235193Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1235564Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1235967Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1236350Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1236612Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1236998Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1237398Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1237766Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1238184Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1238553Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1238913Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1239216Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1239506Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1239820Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1240223Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1240591Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1241065Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1241433Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1241847Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1242273Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1242676Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1243125Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1243527Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1243908Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1244201Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1244502Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1244782Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1245063Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1245425Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1245746Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1246030Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1246312Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1246577Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1246773Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1246886Z #135 _start from ??:0 2025-12-04T10:52:01.1247008Z #136 from ??:0 2025-12-04T10:52:01.1247015Z 2025-12-04T10:52:01.1247019Z 2025-12-04T10:52:01.1247248Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1247858Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1247864Z 2025-12-04T10:52:01.1248130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1248287Z =================================== FAILURES =================================== 2025-12-04T10:52:01.1248577Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1248713Z Traceback (most recent call last): 2025-12-04T10:52:01.1249301Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1249425Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1249797Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1250026Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1250391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1250517Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1250689Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1251491Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1251598Z C++ CapturedTraceback: 2025-12-04T10:52:01.1252964Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1253457Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1253785Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1255122Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1256880Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1263951Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1265359Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1266972Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1267812Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1271934Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1272872Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1274004Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1277423Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1278063Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1278799Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1279637Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1284586Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1284917Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1285254Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1285522Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1285779Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1286170Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1286515Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1286821Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1287126Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1287423Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1287843Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1288208Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1288477Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1288844Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1289097Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1289474Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1289870Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1290246Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1290647Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1291008Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1291417Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1291784Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1292197Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1292559Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1292847Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1293118Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1293484Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1293827Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1294140Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1294428Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1294804Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1295204Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1295567Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1295980Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1296345Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1296682Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1297046Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1297439Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1297819Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1298218Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1298580Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1298940Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1299238Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1299543Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1299808Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1300064Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1300437Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1300985Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1301362Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1301757Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1302120Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1302389Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1302751Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1303157Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1303519Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1303915Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1304291Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1304540Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1304900Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1305307Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1305670Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1306073Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1306432Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1306882Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1307195Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1307480Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1307788Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1308182Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1308630Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1308897Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1309260Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1309674Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1310032Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1310426Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1310800Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1311145Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1311448Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1311746Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1312041Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1312452Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1312829Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1313233Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1313617Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1314024Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1314409Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1314668Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1315036Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1315448Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1315820Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1316234Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1316602Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1316949Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1317265Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1317559Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1317857Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1318276Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1318704Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1319122Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1319488Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1319891Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1320271Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1320733Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1321113Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1321514Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1321888Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1322247Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1322555Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1322835Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1323115Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1323465Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1323799Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1324085Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1324351Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1324635Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1324832Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1324945Z #135 _start from ??:0 2025-12-04T10:52:01.1325064Z #136 from ??:0 2025-12-04T10:52:01.1325070Z 2025-12-04T10:52:01.1325074Z 2025-12-04T10:52:01.1325285Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1325913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1325922Z 2025-12-04T10:52:01.1326186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1326910Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml - 2025-12-04T10:52:01.1327084Z =========================== short test summary info ============================ 2025-12-04T10:52:01.1327786Z FAILED [0.0039s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1328577Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1328686Z C++ CapturedTraceback: 2025-12-04T10:52:01.1329974Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1330450Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1330842Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1332178Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1333863Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1341016Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1342425Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1344032Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1344791Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1348973Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1349941Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1351052Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1354487Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1355110Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1355849Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1356678Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1361585Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1361852Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1362232Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1362497Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1362830Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1363198Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1363539Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1363849Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1364138Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1364446Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1364844Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1365209Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1365473Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1365841Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1366094Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1366469Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1366868Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1367242Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1367639Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1367997Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1368403Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1368767Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1369178Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1369539Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1369831Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1370100Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1370462Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1370817Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1371224Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1371528Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1371841Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1372237Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1372600Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1373158Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1373523Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1373786Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1374147Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1374546Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1374976Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1375369Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1375743Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1376086Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1376383Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1376684Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1376945Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1377195Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1377569Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1377967Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1378344Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1378739Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1379105Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1379374Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1379734Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1380142Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1380504Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1380905Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1381278Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1381529Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1381908Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1382304Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1382665Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1383075Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1383437Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1383797Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1384096Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1384386Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1384758Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1385157Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1385518Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1385785Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1386147Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1386616Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1386979Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1387376Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1387757Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1388101Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1388412Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1388699Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1388992Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1389402Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1389780Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1390185Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1390571Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1390979Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1391365Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1391623Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1391989Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1392402Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1392772Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1393190Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1393560Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1393917Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1394233Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1394524Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1394835Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1395234Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1395608Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1396021Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1396389Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1396874Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1397256Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1397658Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1398035Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1398437Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1398928Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1399225Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1399528Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1399811Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1400090Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1400436Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1400774Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1401273Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1401558Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1401824Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1402018Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1402221Z #135 _start from ??:0 2025-12-04T10:52:01.1402345Z #136 from ??:0 2025-12-04T10:52:01.1402351Z 2025-12-04T10:52:01.1402356Z 2025-12-04T10:52:01.1402575Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1403199Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1403204Z 2025-12-04T10:52:01.1403469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1403664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:01.1403861Z ================== 1 failed, 95 deselected, 2 rerun in 15.64s ================== 2025-12-04T10:52:01.1403964Z Got exit code 1 2025-12-04T10:52:01.1404083Z Retrying single test... 2025-12-04T10:52:01.1404525Z W1204 10:50:55.407000 82979 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1405065Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml 2025-12-04T10:52:01.1405230Z ============================= test session starts ============================== 2025-12-04T10:52:01.1405575Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:01.1405693Z cachedir: .pytest_cache 2025-12-04T10:52:01.1406203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:01.1406325Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:01.1406443Z configfile: pytest.ini 2025-12-04T10:52:01.1406980Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:01.1407203Z collecting ... collected 96 items / 95 deselected / 1 selected 2025-12-04T10:52:01.1407885Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1408132Z Running 1 items in this shard 2025-12-04T10:52:01.1408138Z 2025-12-04T10:52:01.1409097Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:57.793158683 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T10:52:01.1409610Z [W1204 10:50:57.793179857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1409616Z 2025-12-04T10:52:01.1409849Z ('RERUN', {'yellow': True}) [15.4292s] [100%] 2025-12-04T10:52:01.1410822Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:51:12.231610084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1410828Z 2025-12-04T10:52:01.1410968Z ('RERUN', {'yellow': True}) [0.0068s] [100%] 2025-12-04T10:52:01.1411947Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:51:12.237549278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:52:01.1411952Z 2025-12-04T10:52:01.1412052Z FAILED [0.0040s] [100%] 2025-12-04T10:52:01.1412057Z 2025-12-04T10:52:01.1412211Z ==================================== RERUNS ==================================== 2025-12-04T10:52:01.1412503Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1412643Z Traceback (most recent call last): 2025-12-04T10:52:01.1413229Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1413355Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1413732Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1413969Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1414337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1414464Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1414630Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1415431Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1415540Z C++ CapturedTraceback: 2025-12-04T10:52:01.1416807Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1417297Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1417625Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1418964Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1420659Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1427799Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1429257Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1430863Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1431616Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1435718Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1436660Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1444286Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1447817Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1448488Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1449233Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1450051Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1454915Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1455189Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1455522Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1455786Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1456054Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1456421Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1456763Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1457168Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1457458Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1457768Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1458248Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1458613Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1458916Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1459280Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1459531Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1459907Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1460309Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1460683Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1461082Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1461443Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1461849Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1462209Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1462614Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1462974Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1463359Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1463626Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1464049Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1464404Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1464698Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1464986Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1465293Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1465691Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1466056Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1466464Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1466822Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1467088Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1467449Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1467845Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1468218Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1468612Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1468985Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1469400Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1469695Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1470050Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1470311Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1470564Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1470975Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1471372Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1471791Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1472233Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1472598Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1472865Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1473228Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1473635Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1473998Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1474391Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1474759Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1475045Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1475405Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1475801Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1476180Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1476575Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1476955Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1477295Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1477592Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1477890Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1478189Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1478585Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1478963Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1479217Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1479595Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1479995Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1480358Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1480767Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1481197Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1481550Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1481887Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1482240Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1482552Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1482990Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1483382Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1483794Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1484169Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1484592Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1484962Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1485224Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1485606Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1486012Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1486398Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1486798Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1487168Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1487537Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1487840Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1488145Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1488443Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1488848Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1489231Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1489634Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1490020Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1490421Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1490786Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1491201Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1491568Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1491970Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1492349Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1492636Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1493015Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1493281Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1493559Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1493951Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1494268Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1494564Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1494881Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1495140Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1495349Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1495451Z #135 _start from ??:0 2025-12-04T10:52:01.1495575Z #136 from ??:0 2025-12-04T10:52:01.1495587Z 2025-12-04T10:52:01.1495607Z 2025-12-04T10:52:01.1495824Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1496439Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1496448Z 2025-12-04T10:52:01.1496723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1497013Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1497138Z Traceback (most recent call last): 2025-12-04T10:52:01.1497742Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1497865Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1498238Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1498479Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1498845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1498973Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1499141Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1499942Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1500054Z C++ CapturedTraceback: 2025-12-04T10:52:01.1501515Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1502011Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1502339Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1503673Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1505477Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1512571Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1514068Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1515678Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1516443Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1520565Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1521516Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1522739Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1526223Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1526862Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1527593Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1528432Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1533270Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1533544Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1533880Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1534147Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1534406Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1534789Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1535132Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1535489Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1535791Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1536083Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1536528Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1536892Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1537176Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1537553Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1537804Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1538181Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1538585Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1538946Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1539357Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1539719Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1540131Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1540490Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1540883Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1541257Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1541547Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1541799Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1542174Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1542513Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1542822Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1543108Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1543401Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1543809Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1544174Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1544583Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1544943Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1545197Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1545573Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1545967Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1546324Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1546730Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1547172Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1547534Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1547831Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1548163Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1548435Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1548686Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1549098Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1549494Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1549853Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1550263Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1550623Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1550889Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1551247Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1551640Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1552016Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1552409Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1552768Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1553035Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1553397Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1553802Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1554162Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1554556Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1554933Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1555274Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1555584Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1555877Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1556172Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1556582Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1556948Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1557212Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1557576Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1557972Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1558346Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1558741Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1559164Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1559519Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1559851Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1560148Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1560439Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1560863Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1561249Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1561653Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1562038Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1562506Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1562879Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1563151Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1563519Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1563940Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1564307Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1564709Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1565098Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1565448Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1565754Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1566064Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1566367Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1566790Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1567164Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1567570Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1567956Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1568359Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1568741Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1569146Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1569514Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1569934Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1570301Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1570602Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1570976Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1571245Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1571538Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1571917Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1572237Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1572537Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1572842Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1573122Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1573318Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1573419Z #135 _start from ??:0 2025-12-04T10:52:01.1573557Z #136 from ??:0 2025-12-04T10:52:01.1573567Z 2025-12-04T10:52:01.1573572Z 2025-12-04T10:52:01.1573793Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1574421Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1574430Z 2025-12-04T10:52:01.1574694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1574838Z =================================== FAILURES =================================== 2025-12-04T10:52:01.1575146Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____ 2025-12-04T10:52:01.1575266Z Traceback (most recent call last): 2025-12-04T10:52:01.1575852Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1575992Z correct = forward(*example_inputs) 2025-12-04T10:52:01.1576361Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward 2025-12-04T10:52:01.1576606Z torch.ops.aten._scaled_dot_product_efficient_attention.default( 2025-12-04T10:52:01.1576973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T10:52:01.1577090Z return self._op(*args, **kwargs) 2025-12-04T10:52:01.1577275Z RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1578067Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1578191Z C++ CapturedTraceback: 2025-12-04T10:52:01.1579469Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1579942Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1580283Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1581605Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1583445Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1590557Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1592023Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1593616Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1594391Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1598556Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1599457Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1600607Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1604331Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1604975Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1605707Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1606537Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1611388Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1611675Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1611994Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1612273Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1612531Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1612898Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1613399Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1613701Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1613988Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1614347Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1614748Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1615184Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1615439Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1615803Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1616070Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1616435Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1616843Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1617206Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1617600Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1617975Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1618366Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1618738Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1619132Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1619497Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1619797Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1620050Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1620411Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1620765Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1621065Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1621360Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1621653Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1622052Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1622427Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1622822Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1623203Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1623459Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1623824Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1624231Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1624592Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1625064Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1625424Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1625765Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1626109Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1626394Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1626685Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1626950Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1627313Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1627721Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1628087Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1628481Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1628856Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1629107Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1629478Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1629876Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1630238Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1630646Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1631010Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1631275Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1631634Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1632031Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1632401Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1632798Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1633158Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1633511Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1633811Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1634108Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1634400Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1634797Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1635169Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1635422Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1635796Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1636191Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1636551Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1637019Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1637380Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1637766Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1638063Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1638346Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1638678Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1639073Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1639444Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1639865Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1640234Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1640649Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1641019Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1641278Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1641661Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1642127Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1642513Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1642919Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1643286Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1643650Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1643954Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1644245Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1644561Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1644965Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1645347Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1645752Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1646121Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1646536Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1646905Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1647324Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1647698Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1648099Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1648479Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1648849Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1649167Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1649431Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1649743Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1650102Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1650453Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1650735Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1651012Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1651277Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1651486Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1651586Z #135 _start from ??:0 2025-12-04T10:52:01.1651704Z #136 from ??:0 2025-12-04T10:52:01.1651710Z 2025-12-04T10:52:01.1651715Z 2025-12-04T10:52:01.1651946Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1652560Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1652566Z 2025-12-04T10:52:01.1652843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1653546Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml - 2025-12-04T10:52:01.1653717Z =========================== short test summary info ============================ 2025-12-04T10:52:01.1654432Z FAILED [0.0040s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch! 2025-12-04T10:52:01.1655215Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first): 2025-12-04T10:52:01.1655356Z C++ CapturedTraceback: 2025-12-04T10:52:01.1656632Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T10:52:01.1657124Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T10:52:01.1657457Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T10:52:01.1658782Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1660488Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1667634Z #9 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1669115Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, std::optional const&, std::optional, std::optional, double, long, bool, std::optional, std::optional const&, std::optional) from ??:0 2025-12-04T10:52:01.1670719Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&) const from x_0.cudafe1.cpp:0 2025-12-04T10:52:01.1671487Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1675591Z #13 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T10:52:01.1676491Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from ??:0 2025-12-04T10:52:01.1677680Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional) from VariableType_3.cpp:0 2025-12-04T10:52:01.1681162Z #16 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, bool, double, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple, c10::guts::typelist::typelist const&, bool, double, bool, std::optional > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_3.cpp:0 2025-12-04T10:52:01.1681835Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T10:52:01.1682625Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T10:52:01.1683467Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T10:52:01.1688294Z #20 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T10:52:01.1688581Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T10:52:01.1688902Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T10:52:01.1689189Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1689443Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T10:52:01.1689812Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1690171Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1690470Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1690774Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1691140Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1691542Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1691920Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1692205Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1692581Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1692878Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1693244Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1693654Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1694019Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1694415Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1694792Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1695190Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1695567Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1695962Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1696325Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1696622Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T10:52:01.1696871Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1697248Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1697589Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1697885Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1698181Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1698474Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1698886Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1699247Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1699645Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1700023Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1700277Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1700639Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1701279Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1701645Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1702055Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1702415Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1702758Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1703207Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1703497Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1703772Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T10:52:01.1704071Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1704434Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1704842Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1705244Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1705638Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1706013Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1706269Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1706643Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1707037Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1707398Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1707806Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1708168Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1708434Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1708794Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1709192Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1709562Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1709955Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1710327Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1710669Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1710970Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1711275Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1711570Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1711969Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1712342Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1712593Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1712968Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1713362Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1713725Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1714134Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1714493Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1714847Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1715203Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1715489Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1715827Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1716224Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1716704Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1717168Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1717539Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1717954Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1718329Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1718589Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T10:52:01.1718972Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1719378Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1719760Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1720163Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1720532Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1720895Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T10:52:01.1721201Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T10:52:01.1721504Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T10:52:01.1721804Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T10:52:01.1722266Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T10:52:01.1722655Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1723060Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1723446Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1723848Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1724221Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1724640Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1725008Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1725412Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T10:52:01.1725795Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T10:52:01.1726085Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T10:52:01.1726399Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T10:52:01.1726664Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T10:52:01.1726944Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T10:52:01.1727379Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T10:52:01.1727700Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T10:52:01.1728028Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T10:52:01.1728295Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T10:52:01.1728555Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T10:52:01.1728790Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T10:52:01.1728891Z #135 _start from ??:0 2025-12-04T10:52:01.1729009Z #136 from ??:0 2025-12-04T10:52:01.1729015Z 2025-12-04T10:52:01.1729032Z 2025-12-04T10:52:01.1729246Z To execute this test, run the following from the base repo dir: 2025-12-04T10:52:01.1729855Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1729862Z 2025-12-04T10:52:01.1730138Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:52:01.1730315Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:52:01.1730509Z ================== 1 failed, 95 deselected, 2 rerun in 15.47s ================== 2025-12-04T10:52:01.1730619Z Got exit code 1 2025-12-04T10:52:01.1731148Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile 2025-12-04T10:52:01.1731564Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:52:01.1732001Z W1204 10:51:22.967000 83099 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1732531Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml 2025-12-04T10:52:01.1732704Z ============================= test session starts ============================== 2025-12-04T10:52:01.1733050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:52:01.1733173Z cachedir: .pytest_cache 2025-12-04T10:52:01.1733679Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:52:01.1733804Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:52:01.1733922Z configfile: pytest.ini 2025-12-04T10:52:01.1734452Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:52:01.1734666Z collecting ... collected 96 items / 68 deselected / 28 selected 2025-12-04T10:52:01.1734817Z stepcurrent: skipping 68 already run items. 2025-12-04T10:52:01.1734932Z Running 28 items in this shard 2025-12-04T10:52:01.1734941Z 2025-12-04T10:52:01.1735312Z inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch PASSED [2.8342s] [ 3%] 2025-12-04T10:52:01.1735698Z inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order PASSED [0.6937s] [ 7%] 2025-12-04T10:52:01.1736061Z inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load PASSED [0.4319s] [ 10%] 2025-12-04T10:52:01.1736422Z inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index PASSED [0.1718s] [ 14%] 2025-12-04T10:52:01.1737349Z inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward W1204 10:51:30.245000 83099 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:52:01.1737470Z PASSED [1.7852s] [ 17%] 2025-12-04T10:52:01.1737853Z inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped PASSED [0.5743s] [ 21%] 2025-12-04T10:52:01.1738579Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16 PASSED [0.5497s] [ 25%] 2025-12-04T10:52:01.1739256Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16 PASSED [0.5514s] [ 28%] 2025-12-04T10:52:01.1739968Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16 PASSED [0.5632s] [ 32%] 2025-12-04T10:52:01.1740637Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16 PASSED [0.5587s] [ 35%] 2025-12-04T10:52:01.1741323Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16 PASSED [0.5434s] [ 39%] 2025-12-04T10:52:01.1741983Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16 PASSED [0.5811s] [ 42%] 2025-12-04T10:52:01.1742650Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16 PASSED [0.6076s] [ 46%] 2025-12-04T10:52:01.1743308Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16 PASSED [0.5803s] [ 50%] 2025-12-04T10:52:01.1743727Z inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address PASSED [2.1277s] [ 53%] 2025-12-04T10:52:01.1744049Z inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims PASSED [0.7630s] [ 57%] 2025-12-04T10:52:01.1744386Z inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue PASSED [0.3434s] [ 60%] 2025-12-04T10:52:01.1744712Z inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks PASSED [0.5264s] [ 64%] 2025-12-04T10:52:01.1745108Z inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last PASSED [0.2339s] [ 67%] 2025-12-04T10:52:01.1745505Z inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed PASSED [0.0896s] [ 71%] 2025-12-04T10:52:01.1745845Z inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret PASSED [13.4658s] [ 75%] 2025-12-04T10:52:01.1746267Z inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent PASSED [0.4508s] [ 78%] 2025-12-04T10:52:01.1746691Z inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding PASSED [2.3861s] [ 82%] 2025-12-04T10:52:01.1747010Z inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy PASSED [0.0849s] [ 85%] 2025-12-04T10:52:01.1747379Z inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop PASSED [0.8697s] [ 89%] 2025-12-04T10:52:01.1747763Z inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs PASSED [0.3046s] [ 92%] 2025-12-04T10:52:01.1748164Z inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328 PASSED [0.6225s] [ 96%] 2025-12-04T10:52:01.1748531Z inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro PASSED [0.5718s] [100%] 2025-12-04T10:52:01.1748536Z 2025-12-04T10:52:01.1757085Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml - 2025-12-04T10:52:01.1757308Z ====================== 28 passed, 68 deselected in 33.94s ====================== 2025-12-04T10:52:01.1758808Z The following tests failed consistently: ['test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses', 'test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned', 'test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile'] 2025-12-04T10:52:01.1758821Z 2025-12-04T10:52:01.1759339Z FINISHED PRINTING LOG FILE of inductor/test_cuda_repro 1/1 (test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log) 2025-12-04T10:52:01.1759345Z 2025-12-04T10:52:01.1759841Z Finished inductor/test_cuda_repro 1/1 ... [2025-12-04 10:52:00.889853][5878.499753263], took 5.02min 2025-12-04T10:52:01.1760608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml 2025-12-04T10:52:01.1761418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml 2025-12-04T10:52:01.1762325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml 2025-12-04T10:52:01.1763098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml 2025-12-04T10:52:01.1763868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml 2025-12-04T10:52:01.1764617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml 2025-12-04T10:52:01.1765378Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml 2025-12-04T10:52:01.1822892Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml 2025-12-04T10:52:01.2121505Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml 2025-12-04T10:52:01.2475120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml 2025-12-04T10:52:01.5867737Z Uploading logs for 57119749427 to S3 2025-12-04T10:52:01.6405488Z Uploading artifacts took 0.37 seconds 2025-12-04T10:52:01.6405919Z inductor/test_cuda_repro 1/1 failed! 2025-12-04T10:52:01.6410889Z Running inductor/test_cudagraph_trees 1/1 ... [2025-12-04 10:52:01.640889][5879.250796684] 2025-12-04T10:52:01.6411484Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:52:01.6415405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:52:01.641316] 2025-12-04T10:55:24.4406254Z 2025-12-04T10:55:24.4407193Z PRINTING LOG FILE of inductor/test_cudagraph_trees 1/1 (test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log) 2025-12-04T10:55:24.4408522Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml 2025-12-04T10:55:24.4409424Z ============================= test session starts ============================== 2025-12-04T10:55:24.4410082Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:55:24.4410684Z cachedir: .pytest_cache 2025-12-04T10:55:24.4411393Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:55:24.4412174Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:55:24.4412517Z configfile: pytest.ini 2025-12-04T10:55:24.4413232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:55:24.4414023Z collecting ... collected 166 items 2025-12-04T10:55:24.4414437Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:55:24.4493427Z Running 166 items in this shard: test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_grad, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_multiple_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_alias_of_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_output_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_storage_single_weakref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliasing_static_ref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_amp_cache_disabled, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cleanup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_constant_output, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_conv_benchmark, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_or_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_warmup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_cpu_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_storage, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_end_recording_early, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_execution_into_recording, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_expanded_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_generation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_frozen_fn, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_function_compiled_multiple_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_condition_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_only, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation_late_free, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_rule, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_foreach_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_gc, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_False, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_True, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_log_message, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_view_fallback, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_with_memory_plan_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_index_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_manager_per_device, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mark_step, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_meta_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_child_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_parent_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multinomial, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_insert_removal_caching, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_reinplaced, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_output_alias, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_peristed_output_livenes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_run_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_separate_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_side_stream_memory_allocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_single_stream_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_symbolic, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_sparsity, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_storage_access_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_constant_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unstable_ptr, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warmup_stream_sync, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_on_pending_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_workspace_allocation_error, test/inductor/test_cudagraph_trees.py::TestSAC::test_cpu_and_cuda_rng, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraph_uneven_forward_backward, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal_device_one, test/inductor/test_cudagraph_trees.py::TestSAC::test_graph_partition_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_multi_device, test/inductor/test_cudagraph_trees.py::TestSAC::test_retain_graph, test/inductor/test_cudagraph_trees.py::TestSAC::test_simple, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order0, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order1, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order2, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order3, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order4, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order5 2025-12-04T10:55:24.4570964Z 2025-12-04T10:55:24.4571398Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_grad PASSED [4.7031s] [ 0%] 2025-12-04T10:55:24.4572411Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_multiple_recordings PASSED [1.5793s] [ 1%] 2025-12-04T10:55:24.4573410Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_alias_of_parameter PASSED [0.3948s] [ 1%] 2025-12-04T10:55:24.4574539Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_output_checkpoint PASSED [0.1994s] [ 2%] 2025-12-04T10:55:24.4575562Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_static_parameter PASSED [0.1922s] [ 3%] 2025-12-04T10:55:24.4577161Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_storage_single_weakref W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] Graph break from `Tensor.item()`, consider setting: 2025-12-04T10:55:24.4578821Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] torch._dynamo.config.capture_scalar_outputs = True 2025-12-04T10:55:24.4579852Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] or: 2025-12-04T10:55:24.4580803Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1 2025-12-04T10:55:24.4581971Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] to include these operations in the captured graph. 2025-12-04T10:55:24.4582933Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 2025-12-04T10:55:24.4583805Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] Graph break: from user code at: 2025-12-04T10:55:24.4585307Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 1280, in torch_dynamo_resume_in_foo_at_1278 2025-12-04T10:55:24.4586748Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] x_alias2 = x[ind:] 2025-12-04T10:55:24.4587588Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 2025-12-04T10:55:24.4588307Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 2025-12-04T10:55:24.4588853Z PASSED [0.4177s] [ 3%] 2025-12-04T10:55:24.4589882Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliasing_static_ref W1204 10:52:19.384000 84145 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:55:24.4590953Z PASSED [1.5164s] [ 4%] 2025-12-04T10:55:24.4591527Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_amp_cache_disabled PASSED [0.7626s] [ 4%] 2025-12-04T10:55:24.4592541Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs PASSED [1.8677s] [ 5%] 2025-12-04T10:55:24.4593605Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward PASSED [1.7867s] [ 6%] 2025-12-04T10:55:24.4594817Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index SKIPPED [0.0004s] (requires multiple cuda devices) [ 6%] 2025-12-04T10:55:24.4596019Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_forward_backward PASSED [1.3256s] [ 7%] 2025-12-04T10:55:24.4597122Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation PASSED [0.2036s] [ 7%] 2025-12-04T10:55:24.4598282Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs PASSED [0.4354s] [ 8%] 2025-12-04T10:55:24.4599259Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cleanup PASSED [0.6753s] [ 9%] 2025-12-04T10:55:24.4600236Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params PASSED [1.0392s] [ 9%] 2025-12-04T10:55:24.4601405Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_constant_output PASSED [0.7132s] [ 10%] 2025-12-04T10:55:24.4602361Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_conv_benchmark PASSED [2.0846s] [ 10%] 2025-12-04T10:55:24.4603249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cpp_wrapper PASSED [2.3583s] [ 11%] 2025-12-04T10:55:24.4604302Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes PASSED [1.0990s] [ 12%] 2025-12-04T10:55:24.4605309Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1 PASSED [0.5442s] [ 12%] 2025-12-04T10:55:24.4606311Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2 PASSED [0.5570s] [ 13%] 2025-12-04T10:55:24.4607323Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_or_error PASSED [0.3805s] [ 13%] 2025-12-04T10:55:24.4608267Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_backward PASSED [1.5615s] [ 14%] 2025-12-04T10:55:24.4609232Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_warmup PASSED [0.2247s] [ 15%] 2025-12-04T10:55:24.4610148Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_cpu_tensor PASSED [0.4109s] [ 15%] 2025-12-04T10:55:24.4611048Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_storage PASSED [0.7172s] [ 16%] 2025-12-04T10:55:24.4611971Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_end_recording_early PASSED [0.7367s] [ 16%] 2025-12-04T10:55:24.4612925Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use PASSED [0.3945s] [ 17%] 2025-12-04T10:55:24.4613885Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use2 PASSED [0.3941s] [ 18%] 2025-12-04T10:55:24.4614853Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_execution_into_recording PASSED [0.7579s] [ 18%] 2025-12-04T10:55:24.4615815Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_expanded_inputs PASSED [0.4273s] [ 19%] 2025-12-04T10:55:24.4616873Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times PASSED [0.4838s] [ 19%] 2025-12-04T10:55:24.4618224Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor PASSED [0.5743s] [ 20%] 2025-12-04T10:55:24.4619627Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once PASSED [0.4842s] [ 21%] 2025-12-04T10:55:24.4620761Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward PASSED [0.8052s] [ 21%] 2025-12-04T10:55:24.4621829Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs PASSED [0.4135s] [ 22%] 2025-12-04T10:55:24.4623026Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor PASSED [0.6667s] [ 22%] 2025-12-04T10:55:24.4624080Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_generation PASSED [0.8775s] [ 23%] 2025-12-04T10:55:24.4625128Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward PASSED [0.5242s] [ 24%] 2025-12-04T10:55:24.4626129Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_frozen_fn PASSED [0.3914s] [ 24%] 2025-12-04T10:55:24.4627095Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_function_compiled_multiple_times PASSED [0.6896s] [ 25%] 2025-12-04T10:55:24.4628455Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition W1204 10:52:47.385000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4629736Z W1204 10:52:47.387000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4630366Z PASSED [1.0857s] [ 25%] 2025-12-04T10:55:24.4631361Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse W1204 10:52:48.525000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4632682Z W1204 10:52:48.527000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4633589Z W1204 10:52:48.531000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4634552Z W1204 10:52:48.533000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4635179Z PASSED [1.1736s] [ 26%] 2025-12-04T10:55:24.4635796Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_condition_op PASSED [1.0909s] [ 27%] 2025-12-04T10:55:24.4636860Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_only PASSED [1.7430s] [ 27%] 2025-12-04T10:55:24.4638309Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes W1204 10:52:52.484000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4639729Z W1204 10:52:52.486000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4640621Z W1204 10:52:53.443000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4641527Z W1204 10:52:53.445000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4642150Z PASSED [2.2289s] [ 28%] 2025-12-04T10:55:24.4643186Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1 W1204 10:52:54.632000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4644233Z PASSED [0.9696s] [ 28%] 2025-12-04T10:55:24.4645216Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2 W1204 10:52:55.609000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4646261Z PASSED [0.9850s] [ 29%] 2025-12-04T10:55:24.4647230Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3 W1204 10:52:56.599000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4648267Z PASSED [0.9807s] [ 30%] 2025-12-04T10:55:24.4649249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4 W1204 10:52:57.580000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4650285Z PASSED [0.9836s] [ 30%] 2025-12-04T10:55:24.4651303Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put W1204 10:52:58.558000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4652681Z W1204 10:52:58.559000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4653306Z PASSED [0.9402s] [ 31%] 2025-12-04T10:55:24.4654331Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple W1204 10:52:59.506000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4655401Z PASSED [0.9976s] [ 31%] 2025-12-04T10:55:24.4656422Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation W1204 10:53:00.497000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4657793Z W1204 10:53:00.500000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4658419Z PASSED [0.9565s] [ 32%] 2025-12-04T10:55:24.4659060Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints PASSED [2.0533s] [ 33%] 2025-12-04T10:55:24.4660109Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op PASSED [0.6010s] [ 33%] 2025-12-04T10:55:24.4661200Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes PASSED [0.8776s] [ 34%] 2025-12-04T10:55:24.4662328Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation PASSED [0.4727s] [ 34%] 2025-12-04T10:55:24.4663452Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation_late_free PASSED [0.5923s] [ 35%] 2025-12-04T10:55:24.4664685Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split PASSED [0.7775s] [ 36%] 2025-12-04T10:55:24.4665743Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_rule PASSED [0.9076s] [ 36%] 2025-12-04T10:55:24.4667223Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs W1204 10:53:07.928000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4668628Z W1204 10:53:07.930000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4669537Z W1204 10:53:08.783000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4670437Z W1204 10:53:08.786000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4671064Z PASSED [1.5044s] [ 37%] 2025-12-04T10:55:24.4671703Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes PASSED [0.5995s] [ 37%] 2025-12-04T10:55:24.4672747Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_foreach_op PASSED [0.4453s] [ 38%] 2025-12-04T10:55:24.4674178Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward W1204 10:53:10.394000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4675535Z W1204 10:53:10.399000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4676150Z PASSED [1.3426s] [ 39%] 2025-12-04T10:55:24.4676847Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called PASSED [0.6847s] [ 39%] 2025-12-04T10:55:24.4678090Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward PASSED [0.5465s] [ 40%] 2025-12-04T10:55:24.4679311Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node PASSED [0.4658s] [ 40%] 2025-12-04T10:55:24.4680329Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_gc PASSED [0.6708s] [ 41%] 2025-12-04T10:55:24.4681286Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_item PASSED [0.4352s] [ 42%] 2025-12-04T10:55:24.4682854Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_False W1204 10:53:14.489000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4684345Z W1204 10:53:14.491000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4685238Z W1204 10:53:14.492000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4685864Z PASSED [1.0453s] [ 42%] 2025-12-04T10:55:24.4686999Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_True W1204 10:53:15.531000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4688461Z W1204 10:53:15.533000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4689355Z W1204 10:53:15.534000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4689977Z PASSED [1.0688s] [ 43%] 2025-12-04T10:55:24.4690605Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_log_message PASSED [0.9958s] [ 43%] 2025-12-04T10:55:24.4691827Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg SKIPPED [0.0003s] (requires multiple cuda devices) [ 44%] 2025-12-04T10:55:24.4693631Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness W1204 10:53:17.656000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4695077Z W1204 10:53:17.658000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4695699Z PASSED [1.0819s] [ 45%] 2025-12-04T10:55:24.4696406Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu PASSED [1.1381s] [ 45%] 2025-12-04T10:55:24.4697918Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave W1204 10:53:19.938000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4699085Z PASSED [1.2184s] [ 46%] 2025-12-04T10:55:24.4699833Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency PASSED [0.8235s] [ 46%] 2025-12-04T10:55:24.4701249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1 PASSED [0.8916s] [ 47%] 2025-12-04T10:55:24.4702757Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_simple W1204 10:53:22.818000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4704071Z W1204 10:53:22.820000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4704702Z PASSED [1.1522s] [ 48%] 2025-12-04T10:55:24.4705653Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint W1204 10:53:23.973000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4706963Z W1204 10:53:23.975000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4707871Z W1204 10:53:24.891000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4708772Z W1204 10:53:24.893000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program 2025-12-04T10:55:24.4709380Z PASSED [2.0956s] [ 48%] 2025-12-04T10:55:24.4710050Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward PASSED [1.5954s] [ 49%] 2025-12-04T10:55:24.4711186Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index PASSED [0.8360s] [ 50%] 2025-12-04T10:55:24.4712409Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing PASSED [0.6498s] [ 50%] 2025-12-04T10:55:24.4713941Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint W1204 10:53:29.215000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4715300Z W1204 10:53:29.217000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4715920Z PASSED [1.1823s] [ 51%] 2025-12-04T10:55:24.4716666Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout PASSED [0.9958s] [ 51%] 2025-12-04T10:55:24.4718263Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:31.629000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4719696Z W1204 10:53:31.631000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4720349Z ('RERUN', {'yellow': True}) [1.4081s] [ 52%] 2025-12-04T10:55:24.4721521Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:32.756000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4723009Z W1204 10:53:32.758000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4723651Z ('RERUN', {'yellow': True}) [1.3146s] [ 52%] 2025-12-04T10:55:24.4724960Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:34.075000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4726392Z W1204 10:53:34.076000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4727045Z FAILED [1.3174s] [ 52%] 2025-12-04T10:55:24.4727761Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse ERROR [0.0001s] [ 52%] 2025-12-04T10:55:24.4728502Z 2025-12-04T10:55:24.4728646Z ==================================== RERUNS ==================================== 2025-12-04T10:55:24.4729242Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___ 2025-12-04T10:55:24.4729788Z Traceback (most recent call last): 2025-12-04T10:55:24.4730642Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4731533Z self.assertEqual(eager_out, compiled_out) 2025-12-04T10:55:24.4732258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T10:55:24.4732996Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T10:55:24.4733819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T10:55:24.4734687Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T10:55:24.4735154Z AssertionError: Tensor-likes are not close! 2025-12-04T10:55:24.4735430Z 2025-12-04T10:55:24.4735549Z Mismatched elements: 64 / 128 (50.0%) 2025-12-04T10:55:24.4736105Z Greatest absolute difference: 2.7803521156311035 at index (65,) (up to 1e-05 allowed) 2025-12-04T10:55:24.4736801Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed) 2025-12-04T10:55:24.4737206Z 2025-12-04T10:55:24.4737330Z The failure occurred for item [0] 2025-12-04T10:55:24.4737567Z 2025-12-04T10:55:24.4737780Z To execute this test, run the following from the base repo dir: 2025-12-04T10:55:24.4738795Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4739584Z 2025-12-04T10:55:24.4739848Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:55:24.4740477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4740955Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4741331Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4741918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4743019Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4743913Z graph_break [] 2025-12-04T10:55:24.4744270Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4744819Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4745514Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4746151Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4746382Z 2025-12-04T10:55:24.4746516Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4746973Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4747672Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4748289Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4748508Z 2025-12-04T10:55:24.4748635Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4749265Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___ 2025-12-04T10:55:24.4749839Z Traceback (most recent call last): 2025-12-04T10:55:24.4750668Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4751594Z self.assertEqual(eager_out, compiled_out) 2025-12-04T10:55:24.4752323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T10:55:24.4753055Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T10:55:24.4753899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T10:55:24.4754768Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T10:55:24.4755249Z AssertionError: Tensor-likes are not close! 2025-12-04T10:55:24.4755510Z 2025-12-04T10:55:24.4755628Z Mismatched elements: 64 / 128 (50.0%) 2025-12-04T10:55:24.4756179Z Greatest absolute difference: 2.7356221675872803 at index (90,) (up to 1e-05 allowed) 2025-12-04T10:55:24.4756891Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed) 2025-12-04T10:55:24.4757279Z 2025-12-04T10:55:24.4757413Z The failure occurred for item [0] 2025-12-04T10:55:24.4757638Z 2025-12-04T10:55:24.4757848Z To execute this test, run the following from the base repo dir: 2025-12-04T10:55:24.4758849Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4759635Z 2025-12-04T10:55:24.4759910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:55:24.4760529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4760983Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4761357Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4761952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4763091Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4763973Z graph_break [] 2025-12-04T10:55:24.4764340Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4764884Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4765557Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4766311Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4766545Z 2025-12-04T10:55:24.4766687Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4767126Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4767825Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4768456Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4768665Z 2025-12-04T10:55:24.4768809Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4769259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4769732Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4770103Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4770683Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4771777Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4772657Z graph_break [] 2025-12-04T10:55:24.4773026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4773549Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4774341Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4774986Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4775215Z 2025-12-04T10:55:24.4775345Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4775830Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4776525Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4777148Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4777354Z 2025-12-04T10:55:24.4777521Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4777912Z ==================================== ERRORS ==================================== 2025-12-04T10:55:24.4778567Z _ ERROR at teardown of CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse _ 2025-12-04T10:55:24.4779189Z Traceback (most recent call last): 2025-12-04T10:55:24.4779805Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 174, in tearDown 2025-12-04T10:55:24.4780482Z self.assertEqual(all_live_block_count(), 0) 2025-12-04T10:55:24.4781215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T10:55:24.4781955Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T10:55:24.4782773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T10:55:24.4783641Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T10:55:24.4784106Z AssertionError: Scalars are not equal! 2025-12-04T10:55:24.4784353Z 2025-12-04T10:55:24.4784457Z Expected 0 but got 2. 2025-12-04T10:55:24.4784741Z Absolute difference: 2 2025-12-04T10:55:24.4785029Z Relative difference: inf 2025-12-04T10:55:24.4785425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4785897Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4786271Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4786849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4787944Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4788826Z graph_break [] 2025-12-04T10:55:24.4789200Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4789723Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4790403Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4791035Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4791265Z 2025-12-04T10:55:24.4791391Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4791995Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4792702Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4793321Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4793529Z 2025-12-04T10:55:24.4793658Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4794129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4794601Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4794958Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4795557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4796652Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4797534Z graph_break [] 2025-12-04T10:55:24.4797895Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4798531Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4799214Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4799855Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4800126Z 2025-12-04T10:55:24.4800256Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4800713Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4801572Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4802651Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4802877Z 2025-12-04T10:55:24.4803007Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4803473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4803925Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4804298Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4804897Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4805986Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4806855Z graph_break [] 2025-12-04T10:55:24.4807219Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4807754Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4808436Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4809053Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4809295Z 2025-12-04T10:55:24.4809424Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4809868Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4810548Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4811171Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4811377Z 2025-12-04T10:55:24.4811516Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4811911Z =================================== FAILURES =================================== 2025-12-04T10:55:24.4812497Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___ 2025-12-04T10:55:24.4813065Z Traceback (most recent call last): 2025-12-04T10:55:24.4813909Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4814790Z self.assertEqual(eager_out, compiled_out) 2025-12-04T10:55:24.4815516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T10:55:24.4816268Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T10:55:24.4817088Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T10:55:24.4817942Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T10:55:24.4818420Z AssertionError: Tensor-likes are not close! 2025-12-04T10:55:24.4818684Z 2025-12-04T10:55:24.4818818Z Mismatched elements: 64 / 128 (50.0%) 2025-12-04T10:55:24.4819347Z Greatest absolute difference: 2.709859848022461 at index (126,) (up to 1e-05 allowed) 2025-12-04T10:55:24.4820051Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed) 2025-12-04T10:55:24.4820450Z 2025-12-04T10:55:24.4820566Z The failure occurred for item [0] 2025-12-04T10:55:24.4820789Z 2025-12-04T10:55:24.4821010Z To execute this test, run the following from the base repo dir: 2025-12-04T10:55:24.4821993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4822792Z 2025-12-04T10:55:24.4823169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:55:24.4823800Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4824273Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4824633Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4825282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4826377Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4827286Z graph_break [] 2025-12-04T10:55:24.4827645Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4828185Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4828868Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4829490Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4829733Z 2025-12-04T10:55:24.4829861Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4830305Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4831001Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4831616Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4831837Z 2025-12-04T10:55:24.4831964Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4832423Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4832879Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4833247Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4833836Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4834931Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4835801Z graph_break [] 2025-12-04T10:55:24.4836169Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4836699Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4837364Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4837999Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4838238Z 2025-12-04T10:55:24.4838368Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4838816Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4839505Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4840126Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4840332Z 2025-12-04T10:55:24.4840476Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4840928Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:55:24.4841392Z frames [('total', 1), ('ok', 1)] 2025-12-04T10:55:24.4841767Z stats [('calls_captured', 7), ('unique_graphs', 1)] 2025-12-04T10:55:24.4842461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T10:55:24.4843550Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)] 2025-12-04T10:55:24.4844443Z graph_break [] 2025-12-04T10:55:24.4844818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T10:55:24.4845345Z cudagraph partition due to non gpu ops. Found from : 2025-12-04T10:55:24.4846026Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo 2025-12-04T10:55:24.4846659Z output1_cpu = output1.cpu() + 1 2025-12-04T10:55:24.4846887Z 2025-12-04T10:55:24.4847026Z cudagraph partition due to non gpu ops 2025-12-04T10:55:24.4847549Z cudagraph partition due to DeviceCopy ops. Found from : 2025-12-04T10:55:24.4848251Z File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo 2025-12-04T10:55:24.4848910Z x2 = output1_cpu.to("cuda") 2025-12-04T10:55:24.4849116Z 2025-12-04T10:55:24.4849247Z cudagraph partition into 3 partitions 2025-12-04T10:55:24.4850260Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml - 2025-12-04T10:55:24.4851395Z =========================== short test summary info ============================ 2025-12-04T10:55:24.4852463Z FAILED [1.3174s] inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse - AssertionError: Tensor-likes are not close! 2025-12-04T10:55:24.4853309Z 2025-12-04T10:55:24.4853431Z Mismatched elements: 64 / 128 (50.0%) 2025-12-04T10:55:24.4853983Z Greatest absolute difference: 2.709859848022461 at index (126,) (up to 1e-05 allowed) 2025-12-04T10:55:24.4854692Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed) 2025-12-04T10:55:24.4855082Z 2025-12-04T10:55:24.4855216Z The failure occurred for item [0] 2025-12-04T10:55:24.4855441Z 2025-12-04T10:55:24.4855652Z To execute this test, run the following from the base repo dir: 2025-12-04T10:55:24.4856666Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4857469Z 2025-12-04T10:55:24.4857732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:55:24.4858828Z ERROR [0.0001s] inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse - AssertionError: Scalars are not equal! 2025-12-04T10:55:24.4859651Z 2025-12-04T10:55:24.4859760Z Expected 0 but got 2. 2025-12-04T10:55:24.4860055Z Absolute difference: 2 2025-12-04T10:55:24.4860360Z Relative difference: inf 2025-12-04T10:55:24.4860740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 2 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:55:24.4861285Z ===== 1 failed, 84 passed, 2 skipped, 1 error, 2 rerun in 84.50s (0:01:24) ===== 2025-12-04T10:55:24.4861759Z Got exit code 1 2025-12-04T10:55:24.4862031Z Retrying single test... 2025-12-04T10:55:24.4862789Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml 2025-12-04T10:55:24.4863683Z ============================= test session starts ============================== 2025-12-04T10:55:24.4864339Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:55:24.4864924Z cachedir: .pytest_cache 2025-12-04T10:55:24.4865604Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:55:24.4866374Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:55:24.4866717Z configfile: pytest.ini 2025-12-04T10:55:24.4867419Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:55:24.4868295Z collecting ... collected 166 items / 165 deselected / 1 selected 2025-12-04T10:55:24.4869375Z stepcurrent: skipping 86 already run items. Running only test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse 2025-12-04T10:55:24.4870354Z Running 1 items in this shard 2025-12-04T10:55:24.4870559Z 2025-12-04T10:55:24.4871473Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:51.762000 86562 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4872902Z W1204 10:53:51.764000 86562 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4873599Z PASSED [6.1242s] [100%] 2025-12-04T10:55:24.4873775Z 2025-12-04T10:55:24.4874553Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml - 2025-12-04T10:55:24.4875662Z ====================== 1 passed, 165 deselected in 6.16s ======================= 2025-12-04T10:55:24.4876091Z Got exit code 0 2025-12-04T10:55:24.4876493Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T10:55:24.4877564Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml 2025-12-04T10:55:24.4878443Z ============================= test session starts ============================== 2025-12-04T10:55:24.4879089Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T10:55:24.4879676Z cachedir: .pytest_cache 2025-12-04T10:55:24.4880366Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:55:24.4881131Z rootdir: /var/lib/jenkins/workspace 2025-12-04T10:55:24.4881476Z configfile: pytest.ini 2025-12-04T10:55:24.4882251Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:55:24.4883128Z collecting ... collected 166 items / 87 deselected / 79 selected 2025-12-04T10:55:24.4883624Z stepcurrent: skipping 87 already run items. 2025-12-04T10:55:24.4884010Z Running 79 items in this shard 2025-12-04T10:55:24.4884218Z 2025-12-04T10:55:24.4885051Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_view_fallback W1204 10:54:12.048000 86846 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4886402Z W1204 10:54:12.049000 86846 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program 2025-12-04T10:55:24.4887039Z PASSED [4.9775s] [ 1%] 2025-12-04T10:55:24.4888140Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_with_memory_plan_reuse W1204 10:54:14.125000 86846 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T10:55:24.4889282Z PASSED [2.3079s] [ 2%] 2025-12-04T10:55:24.4889934Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item PASSED [0.2993s] [ 3%] 2025-12-04T10:55:24.4891026Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero PASSED [0.3663s] [ 5%] 2025-12-04T10:55:24.4892173Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend PASSED [0.2749s] [ 6%] 2025-12-04T10:55:24.4893370Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks PASSED [0.6579s] [ 7%] 2025-12-04T10:55:24.4894413Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_index_put PASSED [0.6848s] [ 8%] 2025-12-04T10:55:24.4895360Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs PASSED [1.1972s] [ 10%] 2025-12-04T10:55:24.4896507Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_manager_per_device SKIPPED [0.0004s] (requires multiple cuda devices) [ 11%] 2025-12-04T10:55:24.4897553Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mark_step PASSED [0.6834s] [ 12%] 2025-12-04T10:55:24.4898418Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_meta_tensor PASSED [0.6780s] [ 13%] 2025-12-04T10:55:24.4899359Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_child_node PASSED [1.0923s] [ 15%] 2025-12-04T10:55:24.4900378Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module PASSED [0.8239s] [ 16%] 2025-12-04T10:55:24.4901700Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer PASSED [0.9133s] [ 17%] 2025-12-04T10:55:24.4902757Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_parent_node PASSED [1.1140s] [ 18%] 2025-12-04T10:55:24.4903857Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module PASSED [0.6419s] [ 20%] 2025-12-04T10:55:24.4905127Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers PASSED [0.9013s] [ 21%] 2025-12-04T10:55:24.4906331Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs PASSED [0.4832s] [ 22%] 2025-12-04T10:55:24.4908734Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multinomial SKIPPED [0.0009s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/166682 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 24%] 2025-12-04T10:55:24.4911221Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs SKIPPED [0.0002s] (requires multiple cuda devices) [ 25%] 2025-12-04T10:55:24.4912679Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor SKIPPED [0.0002s] (requires multiple cuda devices) [ 26%] 2025-12-04T10:55:24.4913935Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_insert_removal_caching PASSED [0.1995s] [ 27%] 2025-12-04T10:55:24.4915122Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs PASSED [0.3128s] [ 29%] 2025-12-04T10:55:24.4916435Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor PASSED [0.5533s] [ 30%] 2025-12-04T10:55:24.4917788Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs PASSED [0.3130s] [ 31%] 2025-12-04T10:55:24.4919182Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor PASSED [0.5585s] [ 32%] 2025-12-04T10:55:24.4920500Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs PASSED [0.3307s] [ 34%] 2025-12-04T10:55:24.4921776Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor PASSED [0.5413s] [ 35%] 2025-12-04T10:55:24.4923157Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs PASSED [0.3225s] [ 36%] 2025-12-04T10:55:24.4924507Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor PASSED [0.5309s] [ 37%] 2025-12-04T10:55:24.4925725Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs PASSED [0.3340s] [ 39%] 2025-12-04T10:55:24.4926805Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor PASSED [0.5975s] [ 40%] 2025-12-04T10:55:24.4927827Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_reinplaced PASSED [0.4281s] [ 41%] 2025-12-04T10:55:24.4928859Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address PASSED [0.8361s] [ 43%] 2025-12-04T10:55:24.4930063Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times PASSED [0.4819s] [ 44%] 2025-12-04T10:55:24.4931155Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_output_alias PASSED [0.2144s] [ 45%] 2025-12-04T10:55:24.4932096Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_peristed_output_livenes PASSED [0.3698s] [ 46%] 2025-12-04T10:55:24.4933116Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors PASSED [0.4224s] [ 48%] 2025-12-04T10:55:24.4934271Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed PASSED [0.5926s] [ 49%] 2025-12-04T10:55:24.4935291Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_non_trees PASSED [0.3138s] [ 50%] 2025-12-04T10:55:24.4936190Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_trees PASSED [0.3055s] [ 51%] 2025-12-04T10:55:24.4937045Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_run_simple PASSED [0.7736s] [ 53%] 2025-12-04T10:55:24.4937975Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_separate_recordings PASSED [0.6962s] [ 54%] 2025-12-04T10:55:24.4938975Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_side_stream_memory_allocation PASSED [0.2207s] [ 55%] 2025-12-04T10:55:24.4939965Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_single_stream_use PASSED [0.5726s] [ 56%] 2025-12-04T10:55:24.4940911Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cpp_wrapper PASSED [2.0238s] [ 58%] 2025-12-04T10:55:24.4941869Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops PASSED [0.4294s] [ 59%] 2025-12-04T10:55:24.4942928Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1 PASSED [1.1930s] [ 60%] 2025-12-04T10:55:24.4944036Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2 PASSED [11.5451s] [ 62%] 2025-12-04T10:55:24.4945035Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_symbolic PASSED [0.4468s] [ 63%] 2025-12-04T10:55:24.4945893Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_sparsity PASSED [0.3235s] [ 64%] 2025-12-04T10:55:24.4946861Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log PASSED [0.6451s] [ 65%] 2025-12-04T10:55:24.4947892Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_storage_access_error PASSED [0.2531s] [ 67%] 2025-12-04T10:55:24.4948881Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_constant_mutation PASSED [0.4722s] [ 68%] 2025-12-04T10:55:24.4949896Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint PASSED [0.2644s] [ 69%] 2025-12-04T10:55:24.4950937Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool PASSED [0.2688s] [ 70%] 2025-12-04T10:55:24.4952001Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs PASSED [0.3517s] [ 72%] 2025-12-04T10:55:24.4953109Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees PASSED [0.3401s] [ 73%] 2025-12-04T10:55:24.4954155Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_trees PASSED [0.3427s] [ 74%] 2025-12-04T10:55:24.4955192Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_parameter PASSED [0.2590s] [ 75%] 2025-12-04T10:55:24.4956154Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unstable_ptr PASSED [0.4235s] [ 77%] 2025-12-04T10:55:24.4957073Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warmup_stream_sync PASSED [5.3250s] [ 78%] 2025-12-04T10:55:24.4958037Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_on_pending_backward PASSED [0.4443s] [ 79%] 2025-12-04T10:55:24.4959098Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached PASSED [1.2814s] [ 81%] 2025-12-04T10:55:24.4960640Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_workspace_allocation_error [W1204 10:55:03.283122954 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T10:55:24.4961784Z PASSED [16.3481s] [ 82%] 2025-12-04T10:55:24.4962345Z inductor/test_cudagraph_trees.py::TestSAC::test_cpu_and_cuda_rng PASSED [0.1776s] [ 83%] 2025-12-04T10:55:24.4963227Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraph_uneven_forward_backward PASSED [0.0051s] [ 84%] 2025-12-04T10:55:24.4965561Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal SKIPPED [0.0007s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/163852 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 86%] 2025-12-04T10:55:24.4968056Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal_device_one SKIPPED [0.0002s] (requires multiple cuda devices) [ 87%] 2025-12-04T10:55:24.4969299Z inductor/test_cudagraph_trees.py::TestSAC::test_graph_partition_cudagraphs_aot_eager_compat_equal PASSED [0.6589s] [ 88%] 2025-12-04T10:55:24.4970382Z inductor/test_cudagraph_trees.py::TestSAC::test_multi_device SKIPPED [0.0003s] (requires multiple cuda devices) [ 89%] 2025-12-04T10:55:24.4971287Z inductor/test_cudagraph_trees.py::TestSAC::test_retain_graph PASSED [0.1200s] [ 91%] 2025-12-04T10:55:24.4972026Z inductor/test_cudagraph_trees.py::TestSAC::test_simple PASSED [0.2320s] [ 92%] 2025-12-04T10:55:24.4972834Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order0 PASSED [0.1448s] [ 93%] 2025-12-04T10:55:24.4973753Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order1 PASSED [0.1403s] [ 94%] 2025-12-04T10:55:24.4974665Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order2 PASSED [0.1397s] [ 96%] 2025-12-04T10:55:24.4975573Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order3 PASSED [0.1398s] [ 97%] 2025-12-04T10:55:24.4976463Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order4 PASSED [0.1392s] [ 98%] 2025-12-04T10:55:24.4977371Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order5 PASSED [0.1393s] [100%] 2025-12-04T10:55:24.4977905Z 2025-12-04T10:55:24.4978677Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml - 2025-12-04T10:55:24.4979794Z =========== 72 passed, 7 skipped, 87 deselected in 74.13s (0:01:14) ============ 2025-12-04T10:55:24.4980933Z The following tests failed and then succeeded when run in a new process['test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse'] 2025-12-04T10:55:24.4981865Z 2025-12-04T10:55:24.4982440Z FINISHED PRINTING LOG FILE of inductor/test_cudagraph_trees 1/1 (test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log) 2025-12-04T10:55:24.4983153Z 2025-12-04T10:55:24.4983514Z Finished inductor/test_cudagraph_trees 1/1 ... [2025-12-04 10:55:24.440612][6082.050521013], took 3.38min 2025-12-04T10:55:24.4984867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml 2025-12-04T10:55:24.5260750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml 2025-12-04T10:55:24.5525963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml 2025-12-04T10:55:24.5862176Z Running inductor/test_cuda_select_algorithm 4/5 ... [2025-12-04 10:55:24.586001][6082.195909186] 2025-12-04T10:55:24.5862795Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:55:24.5865917Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:24.586371] 2025-12-04T11:11:26.2613299Z 2025-12-04T11:11:26.2615388Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 4/5 (test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log) 2025-12-04T11:11:26.2616975Z W1204 10:55:33.511000 88082 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.2619029Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml 2025-12-04T11:11:26.2620165Z ============================= test session starts ============================== 2025-12-04T11:11:26.2620938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.2621538Z cachedir: .pytest_cache 2025-12-04T11:11:26.2622602Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.2623745Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.2624435Z configfile: pytest.ini 2025-12-04T11:11:26.2625581Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.2626876Z collecting ... collected 58 items 2025-12-04T11:11:26.2627267Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:11:26.2640409Z Running 11 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.2653858Z 2025-12-04T11:11:26.2655255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7033s] [ 9%] 2025-12-04T11:11:26.2658027Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.3988s] [ 9%] 2025-12-04T11:11:26.2660678Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.3970s] [ 9%] 2025-12-04T11:11:26.2661897Z 2025-12-04T11:11:26.2662103Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.2663476Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2664547Z Traceback (most recent call last): 2025-12-04T11:11:26.2665705Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2666998Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2668112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2669220Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2672111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2673239Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2673935Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2674356Z 2025-12-04T11:11:26.2674518Z Expected 1 but got 2. 2025-12-04T11:11:26.2674999Z Absolute difference: 1 2025-12-04T11:11:26.2675431Z Relative difference: 1.0 2025-12-04T11:11:26.2675655Z 2025-12-04T11:11:26.2676016Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2678007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2679528Z 2025-12-04T11:11:26.2679941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2680875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2681644Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2682690Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2684004Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2684730Z graph_break [] 2025-12-04T11:11:26.2685296Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2687114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2688547Z warnings.warn( 2025-12-04T11:11:26.2689898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2691393Z warnings.warn( 2025-12-04T11:11:26.2692381Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2693413Z Traceback (most recent call last): 2025-12-04T11:11:26.2694282Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2695273Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2696152Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2696906Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2697733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2698600Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2699075Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2699320Z 2025-12-04T11:11:26.2699428Z Expected 1 but got 2. 2025-12-04T11:11:26.2699716Z Absolute difference: 1 2025-12-04T11:11:26.2700009Z Relative difference: 1.0 2025-12-04T11:11:26.2700197Z 2025-12-04T11:11:26.2700408Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2702041Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2703077Z 2025-12-04T11:11:26.2703340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2704020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2704488Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2705223Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2706165Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2706629Z graph_break [] 2025-12-04T11:11:26.2706987Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2708077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2709034Z warnings.warn( 2025-12-04T11:11:26.2709898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2710844Z warnings.warn( 2025-12-04T11:11:26.2711221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2711698Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2712125Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2713002Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2713756Z graph_break [] 2025-12-04T11:11:26.2714109Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2715182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2716128Z warnings.warn( 2025-12-04T11:11:26.2717001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2717926Z warnings.warn( 2025-12-04T11:11:26.2718234Z =================================== FAILURES =================================== 2025-12-04T11:11:26.2719027Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2719782Z Traceback (most recent call last): 2025-12-04T11:11:26.2720506Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2721365Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2722256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2723000Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2723824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2724697Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2725172Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2725418Z 2025-12-04T11:11:26.2725522Z Expected 1 but got 2. 2025-12-04T11:11:26.2725807Z Absolute difference: 1 2025-12-04T11:11:26.2726100Z Relative difference: 1.0 2025-12-04T11:11:26.2726286Z 2025-12-04T11:11:26.2726494Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2727808Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2728834Z 2025-12-04T11:11:26.2729097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2729715Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2730237Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2730969Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2731884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2732344Z graph_break [] 2025-12-04T11:11:26.2732699Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2733773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2734724Z warnings.warn( 2025-12-04T11:11:26.2735572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2736516Z warnings.warn( 2025-12-04T11:11:26.2736893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2737360Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2737780Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2738668Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2739416Z graph_break [] 2025-12-04T11:11:26.2739769Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2740841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2741889Z warnings.warn( 2025-12-04T11:11:26.2742876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2743810Z warnings.warn( 2025-12-04T11:11:26.2744193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2744665Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2745103Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2745970Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2746721Z graph_break [] 2025-12-04T11:11:26.2747091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2748147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2749094Z warnings.warn( 2025-12-04T11:11:26.2749961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2750904Z warnings.warn( 2025-12-04T11:11:26.2751875Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml - 2025-12-04T11:11:26.2753011Z =========================== short test summary info ============================ 2025-12-04T11:11:26.2754244Z FAILED [0.3970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2755282Z 2025-12-04T11:11:26.2755400Z Expected 1 but got 2. 2025-12-04T11:11:26.2755768Z Absolute difference: 1 2025-12-04T11:11:26.2756073Z Relative difference: 1.0 2025-12-04T11:11:26.2756261Z 2025-12-04T11:11:26.2756494Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2758718Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2759743Z 2025-12-04T11:11:26.2760044Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2760630Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.2761118Z ========================== 1 failed, 2 rerun in 4.53s ========================== 2025-12-04T11:11:26.2761600Z Got exit code 1 2025-12-04T11:11:26.2761856Z Retrying single test... 2025-12-04T11:11:26.2762489Z W1204 10:55:53.043000 88251 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.2763718Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml 2025-12-04T11:11:26.2764658Z ============================= test session starts ============================== 2025-12-04T11:11:26.2765321Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.2765919Z cachedir: .pytest_cache 2025-12-04T11:11:26.2766628Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.2767394Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.2767756Z configfile: pytest.ini 2025-12-04T11:11:26.2768475Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.2769356Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.2770665Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2771880Z Running 1 items in this shard 2025-12-04T11:11:26.2772085Z 2025-12-04T11:11:26.2773351Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:55:56.153388380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2774740Z 2025-12-04T11:11:26.2775265Z [W1204 10:56:12.736282206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2775910Z 2025-12-04T11:11:26.2776434Z [W1204 10:56:12.736533603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2777074Z 2025-12-04T11:11:26.2777573Z [W1204 10:56:12.743718646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2778224Z 2025-12-04T11:11:26.2778726Z [W1204 10:56:12.744409162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2779374Z 2025-12-04T11:11:26.2779925Z [W1204 10:56:12.744596111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2780560Z 2025-12-04T11:11:26.2781076Z [W1204 10:56:12.751482569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2781713Z 2025-12-04T11:11:26.2782334Z [W1204 10:56:12.752162629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2782971Z 2025-12-04T11:11:26.2783471Z [W1204 10:56:12.752347189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2784162Z 2025-12-04T11:11:26.2784661Z [W1204 10:56:14.696887963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2785314Z 2025-12-04T11:11:26.2785818Z [W1204 10:56:14.698621794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2786494Z 2025-12-04T11:11:26.2786997Z [W1204 10:56:14.698825805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2787631Z 2025-12-04T11:11:26.2788154Z [W1204 10:56:14.702704605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2788791Z 2025-12-04T11:11:26.2789305Z [W1204 10:56:14.703335993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2789937Z 2025-12-04T11:11:26.2790445Z [W1204 10:56:14.703532011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2791091Z 2025-12-04T11:11:26.2791596Z [W1204 10:56:14.709440101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2792244Z 2025-12-04T11:11:26.2792750Z [W1204 10:56:14.710084747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2793386Z 2025-12-04T11:11:26.2793909Z [W1204 10:56:14.710287979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2794546Z 2025-12-04T11:11:26.2794691Z ('RERUN', {'yellow': True}) [19.3198s] [100%] 2025-12-04T11:11:26.2796183Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:14.063034583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2797561Z 2025-12-04T11:11:26.2798067Z [W1204 10:56:14.063798034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2798714Z 2025-12-04T11:11:26.2799219Z [W1204 10:56:14.063994248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2799851Z 2025-12-04T11:11:26.2800360Z [W1204 10:56:14.067847114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2801191Z 2025-12-04T11:11:26.2801767Z [W1204 10:56:14.068610926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2802403Z 2025-12-04T11:11:26.2802907Z [W1204 10:56:14.068798938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2803554Z 2025-12-04T11:11:26.2804053Z [W1204 10:56:14.074761041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2804701Z 2025-12-04T11:11:26.2805203Z [W1204 10:56:14.075378078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2805835Z 2025-12-04T11:11:26.2806353Z [W1204 10:56:14.075562586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2806985Z 2025-12-04T11:11:26.2807642Z [W1204 10:56:14.163880897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2808279Z 2025-12-04T11:11:26.2808801Z [W1204 10:56:14.164673307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2809502Z 2025-12-04T11:11:26.2810004Z [W1204 10:56:14.164883147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2810697Z 2025-12-04T11:11:26.2811199Z [W1204 10:56:14.168769227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2811832Z 2025-12-04T11:11:26.2812350Z [W1204 10:56:14.169395799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2812988Z 2025-12-04T11:11:26.2813510Z [W1204 10:56:14.169587425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2814143Z 2025-12-04T11:11:26.2814642Z [W1204 10:56:14.175566612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2815292Z 2025-12-04T11:11:26.2815793Z [W1204 10:56:14.176376114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2816447Z 2025-12-04T11:11:26.2816948Z [W1204 10:56:14.176566289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2817586Z 2025-12-04T11:11:26.2817731Z ('RERUN', {'yellow': True}) [0.4280s] [100%] 2025-12-04T11:11:26.2819233Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:14.471158691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2820596Z 2025-12-04T11:11:26.2821101Z [W1204 10:56:14.471911341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2821751Z 2025-12-04T11:11:26.2822252Z [W1204 10:56:14.472107232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2822900Z 2025-12-04T11:11:26.2823403Z [W1204 10:56:14.475990930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2824051Z 2025-12-04T11:11:26.2824555Z [W1204 10:56:14.476765413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2825193Z 2025-12-04T11:11:26.2825710Z [W1204 10:56:14.476953296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2826346Z 2025-12-04T11:11:26.2826862Z [W1204 10:56:14.482869398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2827498Z 2025-12-04T11:11:26.2828002Z [W1204 10:56:14.483491792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2828653Z 2025-12-04T11:11:26.2829157Z [W1204 10:56:14.483674941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2829803Z 2025-12-04T11:11:26.2830303Z [W1204 10:56:14.569019750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2830935Z 2025-12-04T11:11:26.2831531Z [W1204 10:56:14.569751367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2832166Z 2025-12-04T11:11:26.2832684Z [W1204 10:56:14.569950798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2833347Z 2025-12-04T11:11:26.2833847Z [W1204 10:56:14.573797332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2834493Z 2025-12-04T11:11:26.2835024Z [W1204 10:56:14.574424349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2835670Z 2025-12-04T11:11:26.2836173Z [W1204 10:56:14.574615260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2836804Z 2025-12-04T11:11:26.2837321Z [W1204 10:56:14.580475550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2837957Z 2025-12-04T11:11:26.2838471Z [W1204 10:56:14.581239295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2839107Z 2025-12-04T11:11:26.2839606Z [W1204 10:56:14.581428241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2840250Z 2025-12-04T11:11:26.2840349Z FAILED [0.4031s] [100%] 2025-12-04T11:11:26.2840535Z 2025-12-04T11:11:26.2840677Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.2841518Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2842263Z Traceback (most recent call last): 2025-12-04T11:11:26.2843006Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2843880Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2844697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2845442Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2846275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2847145Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2847601Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2847860Z 2025-12-04T11:11:26.2847965Z Expected 1 but got 2. 2025-12-04T11:11:26.2848250Z Absolute difference: 1 2025-12-04T11:11:26.2848538Z Relative difference: 1.0 2025-12-04T11:11:26.2848724Z 2025-12-04T11:11:26.2848936Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2850173Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2851185Z 2025-12-04T11:11:26.2851465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2852087Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2852550Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2853282Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2854170Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2854623Z graph_break [] 2025-12-04T11:11:26.2854996Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2856615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.2858053Z if out == self.unknown_value: 2025-12-04T11:11:26.2859012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2859974Z warnings.warn( 2025-12-04T11:11:26.2860849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2861832Z warnings.warn( 2025-12-04T11:11:26.2862480Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2863245Z Traceback (most recent call last): 2025-12-04T11:11:26.2863993Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2864849Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2865657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2866410Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2867227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2868084Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2868552Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2868797Z 2025-12-04T11:11:26.2868916Z Expected 1 but got 2. 2025-12-04T11:11:26.2869200Z Absolute difference: 1 2025-12-04T11:11:26.2869481Z Relative difference: 1.0 2025-12-04T11:11:26.2869677Z 2025-12-04T11:11:26.2869884Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2871115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2872122Z 2025-12-04T11:11:26.2872385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2873001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2873469Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2874204Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2875126Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2875592Z graph_break [] 2025-12-04T11:11:26.2875958Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2877502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.2878926Z if out == self.unknown_value: 2025-12-04T11:11:26.2879858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2880834Z warnings.warn( 2025-12-04T11:11:26.2881768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2882737Z warnings.warn( 2025-12-04T11:11:26.2883119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2883620Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2884162Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2885048Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2885843Z graph_break [] 2025-12-04T11:11:26.2886216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2887269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2888259Z warnings.warn( 2025-12-04T11:11:26.2889119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2890049Z warnings.warn( 2025-12-04T11:11:26.2890359Z =================================== FAILURES =================================== 2025-12-04T11:11:26.2891185Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.2891939Z Traceback (most recent call last): 2025-12-04T11:11:26.2892694Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.2893594Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.2894409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.2895165Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.2895975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.2896853Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.2897358Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2897608Z 2025-12-04T11:11:26.2897716Z Expected 1 but got 2. 2025-12-04T11:11:26.2898003Z Absolute difference: 1 2025-12-04T11:11:26.2898299Z Relative difference: 1.0 2025-12-04T11:11:26.2898487Z 2025-12-04T11:11:26.2898710Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2899936Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2901154Z 2025-12-04T11:11:26.2901424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2902061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2902536Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2903267Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2904171Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2904632Z graph_break [] 2025-12-04T11:11:26.2904988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2906528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.2907969Z if out == self.unknown_value: 2025-12-04T11:11:26.2908903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2909836Z warnings.warn( 2025-12-04T11:11:26.2910712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2911653Z warnings.warn( 2025-12-04T11:11:26.2912201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2912668Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2913106Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2914040Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2914779Z graph_break [] 2025-12-04T11:11:26.2915152Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2916274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2917227Z warnings.warn( 2025-12-04T11:11:26.2918092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2919041Z warnings.warn( 2025-12-04T11:11:26.2919417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.2919872Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.2920315Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.2921195Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.2922028Z graph_break [] 2025-12-04T11:11:26.2922384Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.2923457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2924399Z warnings.warn( 2025-12-04T11:11:26.2925288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.2926223Z warnings.warn( 2025-12-04T11:11:26.2927216Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml - 2025-12-04T11:11:26.2928349Z =========================== short test summary info ============================ 2025-12-04T11:11:26.2929594Z FAILED [0.4031s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.2930627Z 2025-12-04T11:11:26.2930734Z Expected 1 but got 2. 2025-12-04T11:11:26.2931026Z Absolute difference: 1 2025-12-04T11:11:26.2931324Z Relative difference: 1.0 2025-12-04T11:11:26.2931514Z 2025-12-04T11:11:26.2931725Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.2932965Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2933994Z 2025-12-04T11:11:26.2934258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.2934840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.2935351Z ================== 1 failed, 10 deselected, 2 rerun in 20.18s ================== 2025-12-04T11:11:26.2935800Z Got exit code 1 2025-12-04T11:11:26.2936067Z Retrying single test... 2025-12-04T11:11:26.2936688Z W1204 10:56:26.002000 88425 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.2937894Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml 2025-12-04T11:11:26.2938971Z ============================= test session starts ============================== 2025-12-04T11:11:26.2939621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.2940204Z cachedir: .pytest_cache 2025-12-04T11:11:26.2940941Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.2941714Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.2942065Z configfile: pytest.ini 2025-12-04T11:11:26.2942800Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.2943674Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.2944986Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.2946183Z Running 1 items in this shard 2025-12-04T11:11:26.2946387Z 2025-12-04T11:11:26.2947631Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:29.098216967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2949022Z 2025-12-04T11:11:26.2949529Z [W1204 10:56:44.869871539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2950180Z 2025-12-04T11:11:26.2950683Z [W1204 10:56:44.870155439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2951337Z 2025-12-04T11:11:26.2951843Z [W1204 10:56:44.877374852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2952477Z 2025-12-04T11:11:26.2952993Z [W1204 10:56:44.878093244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2953629Z 2025-12-04T11:11:26.2954130Z [W1204 10:56:44.878288939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2954774Z 2025-12-04T11:11:26.2955279Z [W1204 10:56:44.885080664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2955926Z 2025-12-04T11:11:26.2956425Z [W1204 10:56:44.885714402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2957074Z 2025-12-04T11:11:26.2957577Z [W1204 10:56:44.885893376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2958210Z 2025-12-04T11:11:26.2958731Z [W1204 10:56:46.828084914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2959363Z 2025-12-04T11:11:26.2959878Z [W1204 10:56:46.829787664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2960513Z 2025-12-04T11:11:26.2961009Z [W1204 10:56:46.830015007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2961730Z 2025-12-04T11:11:26.2962229Z [W1204 10:56:46.833891649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2962872Z 2025-12-04T11:11:26.2963377Z [W1204 10:56:46.834533565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2964009Z 2025-12-04T11:11:26.2964620Z [W1204 10:56:46.834721693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2965257Z 2025-12-04T11:11:26.2965774Z [W1204 10:56:46.840617180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2966443Z 2025-12-04T11:11:26.2966941Z [W1204 10:56:46.841238598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2967621Z 2025-12-04T11:11:26.2968120Z [W1204 10:56:46.841425360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2968767Z 2025-12-04T11:11:26.2968898Z ('RERUN', {'yellow': True}) [18.5034s] [100%] 2025-12-04T11:11:26.2970402Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:46.197235976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2971762Z 2025-12-04T11:11:26.2972284Z [W1204 10:56:46.197981154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2972919Z 2025-12-04T11:11:26.2973418Z [W1204 10:56:46.198202172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2974067Z 2025-12-04T11:11:26.2974572Z [W1204 10:56:46.202020570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2975219Z 2025-12-04T11:11:26.2975721Z [W1204 10:56:46.202789913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2976430Z 2025-12-04T11:11:26.2977186Z [W1204 10:56:46.202975955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2977825Z 2025-12-04T11:11:26.2978336Z [W1204 10:56:46.208781567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2978972Z 2025-12-04T11:11:26.2979471Z [W1204 10:56:46.209373872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2980123Z 2025-12-04T11:11:26.2980622Z [W1204 10:56:46.209552665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2981267Z 2025-12-04T11:11:26.2981766Z [W1204 10:56:46.294293671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2982397Z 2025-12-04T11:11:26.2982913Z [W1204 10:56:46.295037406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2983544Z 2025-12-04T11:11:26.2984057Z [W1204 10:56:46.295237725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2984690Z 2025-12-04T11:11:26.2985187Z [W1204 10:56:46.299026941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2985835Z 2025-12-04T11:11:26.2986331Z [W1204 10:56:46.299638874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2986972Z 2025-12-04T11:11:26.2987469Z [W1204 10:56:46.299827696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2988114Z 2025-12-04T11:11:26.2988692Z [W1204 10:56:46.305687767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2989328Z 2025-12-04T11:11:26.2989842Z [W1204 10:56:46.306462577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2990507Z 2025-12-04T11:11:26.2991022Z [W1204 10:56:46.306649558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2991654Z 2025-12-04T11:11:26.2991816Z ('RERUN', {'yellow': True}) [0.4263s] [100%] 2025-12-04T11:11:26.2993316Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:46.597767375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2994693Z 2025-12-04T11:11:26.2995197Z [W1204 10:56:46.598513684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2995833Z 2025-12-04T11:11:26.2996347Z [W1204 10:56:46.598708987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2996982Z 2025-12-04T11:11:26.2997495Z [W1204 10:56:46.602570838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2998126Z 2025-12-04T11:11:26.2998627Z [W1204 10:56:46.603376467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.2999273Z 2025-12-04T11:11:26.2999770Z [W1204 10:56:46.603561483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3000412Z 2025-12-04T11:11:26.3001110Z [W1204 10:56:46.609423164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3001811Z 2025-12-04T11:11:26.3002324Z [W1204 10:56:46.610075409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3002959Z 2025-12-04T11:11:26.3003476Z [W1204 10:56:46.610276998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3004105Z 2025-12-04T11:11:26.3004603Z [W1204 10:56:47.695954333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3005251Z 2025-12-04T11:11:26.3005750Z [W1204 10:56:47.696694028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3006400Z 2025-12-04T11:11:26.3006904Z [W1204 10:56:47.696909425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3007538Z 2025-12-04T11:11:26.3008049Z [W1204 10:56:47.700722824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3008685Z 2025-12-04T11:11:26.3009201Z [W1204 10:56:47.701338652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3009837Z 2025-12-04T11:11:26.3010336Z [W1204 10:56:47.701526798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3010986Z 2025-12-04T11:11:26.3011485Z [W1204 10:56:47.707331907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3012132Z 2025-12-04T11:11:26.3012630Z [W1204 10:56:47.708092736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3013404Z 2025-12-04T11:11:26.3013917Z [W1204 10:56:47.708278729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3014546Z 2025-12-04T11:11:26.3014704Z FAILED [0.3998s] [100%] 2025-12-04T11:11:26.3014877Z 2025-12-04T11:11:26.3015020Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3015800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3016626Z Traceback (most recent call last): 2025-12-04T11:11:26.3017367Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3018215Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3019027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3019783Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3020594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3021464Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3021933Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3022182Z 2025-12-04T11:11:26.3022303Z Expected 1 but got 2. 2025-12-04T11:11:26.3022576Z Absolute difference: 1 2025-12-04T11:11:26.3022872Z Relative difference: 1.0 2025-12-04T11:11:26.3023058Z 2025-12-04T11:11:26.3023283Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3024507Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3025535Z 2025-12-04T11:11:26.3025802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3026424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3026901Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3027630Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3028512Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3028984Z graph_break [] 2025-12-04T11:11:26.3029360Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3030891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3032331Z if out == self.unknown_value: 2025-12-04T11:11:26.3033276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3034230Z warnings.warn( 2025-12-04T11:11:26.3035095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3036045Z warnings.warn( 2025-12-04T11:11:26.3036708Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3037461Z Traceback (most recent call last): 2025-12-04T11:11:26.3038198Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3039061Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3039965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3040700Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3041605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3042517Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3042983Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3043229Z 2025-12-04T11:11:26.3043335Z Expected 1 but got 2. 2025-12-04T11:11:26.3043656Z Absolute difference: 1 2025-12-04T11:11:26.3043945Z Relative difference: 1.0 2025-12-04T11:11:26.3044131Z 2025-12-04T11:11:26.3044338Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3045571Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3046591Z 2025-12-04T11:11:26.3046855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3047470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3047932Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3048668Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3049545Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3050007Z graph_break [] 2025-12-04T11:11:26.3050358Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3051892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3053323Z if out == self.unknown_value: 2025-12-04T11:11:26.3054248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3055179Z warnings.warn( 2025-12-04T11:11:26.3056046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3056992Z warnings.warn( 2025-12-04T11:11:26.3057358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3057827Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3058270Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3059150Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3059894Z graph_break [] 2025-12-04T11:11:26.3060268Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3061337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3062270Z warnings.warn( 2025-12-04T11:11:26.3063139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3064083Z warnings.warn( 2025-12-04T11:11:26.3064387Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3065154Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3065903Z Traceback (most recent call last): 2025-12-04T11:11:26.3066723Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3067585Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3068381Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3069160Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3069982Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3070867Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3071338Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3071598Z 2025-12-04T11:11:26.3071701Z Expected 1 but got 2. 2025-12-04T11:11:26.3071985Z Absolute difference: 1 2025-12-04T11:11:26.3072261Z Relative difference: 1.0 2025-12-04T11:11:26.3072462Z 2025-12-04T11:11:26.3072671Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3073900Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3074907Z 2025-12-04T11:11:26.3075182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3075789Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3076256Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3076990Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3077856Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3078316Z graph_break [] 2025-12-04T11:11:26.3078681Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3080222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3081709Z if out == self.unknown_value: 2025-12-04T11:11:26.3082649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3083608Z warnings.warn( 2025-12-04T11:11:26.3084483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3085412Z warnings.warn( 2025-12-04T11:11:26.3085786Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3086259Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3086681Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3087559Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3088309Z graph_break [] 2025-12-04T11:11:26.3088679Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3089728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3090679Z warnings.warn( 2025-12-04T11:11:26.3091559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3092497Z warnings.warn( 2025-12-04T11:11:26.3092861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3093331Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3093858Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3094730Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3095518Z graph_break [] 2025-12-04T11:11:26.3095896Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3096966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3097961Z warnings.warn( 2025-12-04T11:11:26.3098835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3099781Z warnings.warn( 2025-12-04T11:11:26.3100758Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml - 2025-12-04T11:11:26.3102090Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3103325Z FAILED [0.3998s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3104360Z 2025-12-04T11:11:26.3104481Z Expected 1 but got 2. 2025-12-04T11:11:26.3104760Z Absolute difference: 1 2025-12-04T11:11:26.3105059Z Relative difference: 1.0 2025-12-04T11:11:26.3105263Z 2025-12-04T11:11:26.3105477Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3106712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3107719Z 2025-12-04T11:11:26.3107988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3108573Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3109087Z ================== 1 failed, 10 deselected, 2 rerun in 19.36s ================== 2025-12-04T11:11:26.3109522Z Got exit code 1 2025-12-04T11:11:26.3110463Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3111800Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.3112780Z W1204 10:56:58.174000 88599 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3113989Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml 2025-12-04T11:11:26.3114920Z ============================= test session starts ============================== 2025-12-04T11:11:26.3115569Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3116161Z cachedir: .pytest_cache 2025-12-04T11:11:26.3116845Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3117616Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3117963Z configfile: pytest.ini 2025-12-04T11:11:26.3118678Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3119536Z collecting ... collected 58 items / 1 deselected / 57 selected 2025-12-04T11:11:26.3120020Z stepcurrent: skipping 1 already run items. 2025-12-04T11:11:26.3120400Z Running 10 items in this shard 2025-12-04T11:11:26.3120601Z 2025-12-04T11:11:26.3121678Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7192s] [ 10%] 2025-12-04T11:11:26.3123485Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4016s] [ 10%] 2025-12-04T11:11:26.3125283Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4080s] [ 10%] 2025-12-04T11:11:26.3126242Z 2025-12-04T11:11:26.3126382Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3127162Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3127902Z Traceback (most recent call last): 2025-12-04T11:11:26.3128639Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3129503Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3130322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3131060Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3131887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3132755Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3133210Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3133471Z 2025-12-04T11:11:26.3133576Z Expected 1 but got 2. 2025-12-04T11:11:26.3133864Z Absolute difference: 1 2025-12-04T11:11:26.3134153Z Relative difference: 1.0 2025-12-04T11:11:26.3134342Z 2025-12-04T11:11:26.3134553Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3135785Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3136807Z 2025-12-04T11:11:26.3137066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3137684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3138141Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3138872Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3139750Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3140191Z graph_break [] 2025-12-04T11:11:26.3140564Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3141644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3142595Z warnings.warn( 2025-12-04T11:11:26.3143459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3144404Z warnings.warn( 2025-12-04T11:11:26.3145067Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3145824Z Traceback (most recent call last): 2025-12-04T11:11:26.3146548Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3147411Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3148284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3149022Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3149842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3150744Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3151213Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3151457Z 2025-12-04T11:11:26.3151595Z Expected 1 but got 2. 2025-12-04T11:11:26.3151879Z Absolute difference: 1 2025-12-04T11:11:26.3152170Z Relative difference: 1.0 2025-12-04T11:11:26.3152356Z 2025-12-04T11:11:26.3152567Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3153802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3154829Z 2025-12-04T11:11:26.3155088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3155706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3156163Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3156889Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3157760Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3158220Z graph_break [] 2025-12-04T11:11:26.3158572Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3159639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3160590Z warnings.warn( 2025-12-04T11:11:26.3161535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3162474Z warnings.warn( 2025-12-04T11:11:26.3162851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3163326Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3163754Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3164634Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3165391Z graph_break [] 2025-12-04T11:11:26.3165757Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3166817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3167767Z warnings.warn( 2025-12-04T11:11:26.3168641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3201132Z warnings.warn( 2025-12-04T11:11:26.3201672Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3202544Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3203339Z Traceback (most recent call last): 2025-12-04T11:11:26.3204077Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3204945Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3205763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3206713Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3207543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3208412Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3208943Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3209191Z 2025-12-04T11:11:26.3209301Z Expected 1 but got 2. 2025-12-04T11:11:26.3209589Z Absolute difference: 1 2025-12-04T11:11:26.3209884Z Relative difference: 1.0 2025-12-04T11:11:26.3210123Z 2025-12-04T11:11:26.3210328Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3211543Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3212554Z 2025-12-04T11:11:26.3212812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3213412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3213857Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3214572Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3215430Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3215862Z graph_break [] 2025-12-04T11:11:26.3216215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3217268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3218217Z warnings.warn( 2025-12-04T11:11:26.3219069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3219989Z warnings.warn( 2025-12-04T11:11:26.3220354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3220801Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3221212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3222071Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3222811Z graph_break [] 2025-12-04T11:11:26.3223154Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3224205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3225133Z warnings.warn( 2025-12-04T11:11:26.3225989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3226919Z warnings.warn( 2025-12-04T11:11:26.3227277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3227729Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3228146Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3229002Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3229742Z graph_break [] 2025-12-04T11:11:26.3230096Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3231139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3232063Z warnings.warn( 2025-12-04T11:11:26.3232984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3233915Z warnings.warn( 2025-12-04T11:11:26.3234957Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml - 2025-12-04T11:11:26.3236053Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3237313Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3238348Z 2025-12-04T11:11:26.3238460Z Expected 1 but got 2. 2025-12-04T11:11:26.3238727Z Absolute difference: 1 2025-12-04T11:11:26.3239012Z Relative difference: 1.0 2025-12-04T11:11:26.3239199Z 2025-12-04T11:11:26.3239430Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3240635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3241724Z 2025-12-04T11:11:26.3241976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3242538Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3243026Z =================== 1 failed, 1 deselected, 2 rerun in 4.56s =================== 2025-12-04T11:11:26.3243428Z Got exit code 1 2025-12-04T11:11:26.3243681Z Retrying single test... 2025-12-04T11:11:26.3244306Z W1204 10:57:17.701000 88768 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3245531Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml 2025-12-04T11:11:26.3246469Z ============================= test session starts ============================== 2025-12-04T11:11:26.3247120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3247711Z cachedir: .pytest_cache 2025-12-04T11:11:26.3248396Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3249157Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3249481Z configfile: pytest.ini 2025-12-04T11:11:26.3250185Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3251036Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3252327Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3253513Z Running 1 items in this shard 2025-12-04T11:11:26.3253709Z 2025-12-04T11:11:26.3254953Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:21.781846996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3256331Z 2025-12-04T11:11:26.3256844Z [W1204 10:57:36.335603287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3257473Z 2025-12-04T11:11:26.3257976Z [W1204 10:57:36.335851769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3258610Z 2025-12-04T11:11:26.3259177Z [W1204 10:57:36.343104974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3259815Z 2025-12-04T11:11:26.3260307Z [W1204 10:57:36.343850134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3260973Z 2025-12-04T11:11:26.3261479Z [W1204 10:57:36.344034881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3262139Z 2025-12-04T11:11:26.3262644Z [W1204 10:57:36.350770590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3263273Z 2025-12-04T11:11:26.3263770Z [W1204 10:57:36.351418961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3264408Z 2025-12-04T11:11:26.3264911Z [W1204 10:57:36.351598410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3265547Z 2025-12-04T11:11:26.3266036Z [W1204 10:57:38.293525572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3266668Z 2025-12-04T11:11:26.3267175Z [W1204 10:57:38.295232563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3267806Z 2025-12-04T11:11:26.3268316Z [W1204 10:57:38.295432008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3268947Z 2025-12-04T11:11:26.3269446Z [W1204 10:57:38.299216828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3270088Z 2025-12-04T11:11:26.3270583Z [W1204 10:57:38.299829292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3271218Z 2025-12-04T11:11:26.3271713Z [W1204 10:57:38.300046668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3272340Z 2025-12-04T11:11:26.3272848Z [W1204 10:57:38.305861917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3273483Z 2025-12-04T11:11:26.3273981Z [W1204 10:57:38.306469701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3274606Z 2025-12-04T11:11:26.3275105Z [W1204 10:57:38.306654886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3275748Z 2025-12-04T11:11:26.3275870Z ('RERUN', {'yellow': True}) [19.2777s] [100%] 2025-12-04T11:11:26.3277365Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:39.659886477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3278731Z 2025-12-04T11:11:26.3279237Z [W1204 10:57:39.660682635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3279863Z 2025-12-04T11:11:26.3280435Z [W1204 10:57:39.660887393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3281063Z 2025-12-04T11:11:26.3281624Z [W1204 10:57:39.664757815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3282263Z 2025-12-04T11:11:26.3282835Z [W1204 10:57:39.665543509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3283473Z 2025-12-04T11:11:26.3283975Z [W1204 10:57:39.665728223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3284666Z 2025-12-04T11:11:26.3285165Z [W1204 10:57:39.671637399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3285790Z 2025-12-04T11:11:26.3286296Z [W1204 10:57:39.672251490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3286956Z 2025-12-04T11:11:26.3287461Z [W1204 10:57:39.672430928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3288107Z 2025-12-04T11:11:26.3288608Z [W1204 10:57:39.757117197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3289257Z 2025-12-04T11:11:26.3289756Z [W1204 10:57:39.757834976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3290404Z 2025-12-04T11:11:26.3290907Z [W1204 10:57:39.758031035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3291534Z 2025-12-04T11:11:26.3292049Z [W1204 10:57:39.761811743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3292682Z 2025-12-04T11:11:26.3293191Z [W1204 10:57:39.762433214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3293820Z 2025-12-04T11:11:26.3294316Z [W1204 10:57:39.762619013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3294964Z 2025-12-04T11:11:26.3295466Z [W1204 10:57:39.768391914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3296114Z 2025-12-04T11:11:26.3296612Z [W1204 10:57:39.769143292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3297250Z 2025-12-04T11:11:26.3297759Z [W1204 10:57:39.769330098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3298393Z 2025-12-04T11:11:26.3298533Z ('RERUN', {'yellow': True}) [0.4232s] [100%] 2025-12-04T11:11:26.3300020Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:39.058944203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3301599Z 2025-12-04T11:11:26.3302112Z [W1204 10:57:39.059688112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3302770Z 2025-12-04T11:11:26.3303271Z [W1204 10:57:39.059883102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3303907Z 2025-12-04T11:11:26.3304425Z [W1204 10:57:39.063704186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3305056Z 2025-12-04T11:11:26.3305571Z [W1204 10:57:39.064482534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3306202Z 2025-12-04T11:11:26.3306702Z [W1204 10:57:39.064667513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3307348Z 2025-12-04T11:11:26.3307968Z [W1204 10:57:39.070634801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3308615Z 2025-12-04T11:11:26.3309111Z [W1204 10:57:39.071281120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3309789Z 2025-12-04T11:11:26.3310301Z [W1204 10:57:39.071464672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3310971Z 2025-12-04T11:11:26.3311482Z [W1204 10:57:39.158411652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3312112Z 2025-12-04T11:11:26.3312609Z [W1204 10:57:39.159150971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3313259Z 2025-12-04T11:11:26.3313759Z [W1204 10:57:39.159351301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3314404Z 2025-12-04T11:11:26.3314902Z [W1204 10:57:39.163172391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3315535Z 2025-12-04T11:11:26.3316049Z [W1204 10:57:39.163785695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3316687Z 2025-12-04T11:11:26.3317205Z [W1204 10:57:39.163971763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3317841Z 2025-12-04T11:11:26.3318344Z [W1204 10:57:39.169774134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3318992Z 2025-12-04T11:11:26.3319496Z [W1204 10:57:39.170565570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3320143Z 2025-12-04T11:11:26.3320643Z [W1204 10:57:39.170758653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3321290Z 2025-12-04T11:11:26.3321388Z FAILED [0.3998s] [100%] 2025-12-04T11:11:26.3321622Z 2025-12-04T11:11:26.3321781Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3322551Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3323302Z Traceback (most recent call last): 2025-12-04T11:11:26.3324044Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3324905Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3325714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3326468Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3327288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3328143Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3328606Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3328863Z 2025-12-04T11:11:26.3328972Z Expected 1 but got 2. 2025-12-04T11:11:26.3329257Z Absolute difference: 1 2025-12-04T11:11:26.3329530Z Relative difference: 1.0 2025-12-04T11:11:26.3329726Z 2025-12-04T11:11:26.3329938Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3331247Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3332263Z 2025-12-04T11:11:26.3332538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3333150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3333651Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3334385Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3335253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3335737Z graph_break [] 2025-12-04T11:11:26.3336103Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3337650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3339066Z if out == self.unknown_value: 2025-12-04T11:11:26.3339993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3340944Z warnings.warn( 2025-12-04T11:11:26.3341815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3342749Z warnings.warn( 2025-12-04T11:11:26.3343407Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3344159Z Traceback (most recent call last): 2025-12-04T11:11:26.3344884Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3345749Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3346562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3347307Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3348113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3348973Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3349434Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3349679Z 2025-12-04T11:11:26.3349795Z Expected 1 but got 2. 2025-12-04T11:11:26.3350066Z Absolute difference: 1 2025-12-04T11:11:26.3350356Z Relative difference: 1.0 2025-12-04T11:11:26.3350540Z 2025-12-04T11:11:26.3350760Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3351978Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3353002Z 2025-12-04T11:11:26.3353266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3353883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3354354Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3355074Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3355956Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3356413Z graph_break [] 2025-12-04T11:11:26.3356767Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3358364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3359788Z if out == self.unknown_value: 2025-12-04T11:11:26.3360725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3361762Z warnings.warn( 2025-12-04T11:11:26.3362632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3363614Z warnings.warn( 2025-12-04T11:11:26.3363986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3364466Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3364884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3365765Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3366515Z graph_break [] 2025-12-04T11:11:26.3366865Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3367929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3368871Z warnings.warn( 2025-12-04T11:11:26.3369740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3370670Z warnings.warn( 2025-12-04T11:11:26.3370974Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3371759Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3372493Z Traceback (most recent call last): 2025-12-04T11:11:26.3373227Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3374092Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3374903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3375641Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3376457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3377316Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3377772Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3378012Z 2025-12-04T11:11:26.3378114Z Expected 1 but got 2. 2025-12-04T11:11:26.3378392Z Absolute difference: 1 2025-12-04T11:11:26.3378679Z Relative difference: 1.0 2025-12-04T11:11:26.3378866Z 2025-12-04T11:11:26.3379075Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3380294Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3381311Z 2025-12-04T11:11:26.3381569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3382180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3382642Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3383373Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3384238Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3384696Z graph_break [] 2025-12-04T11:11:26.3385047Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3386677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3388137Z if out == self.unknown_value: 2025-12-04T11:11:26.3389069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3390034Z warnings.warn( 2025-12-04T11:11:26.3390900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3391835Z warnings.warn( 2025-12-04T11:11:26.3392191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3392653Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3393086Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3393954Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3394689Z graph_break [] 2025-12-04T11:11:26.3395052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3396114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3397044Z warnings.warn( 2025-12-04T11:11:26.3397903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3398847Z warnings.warn( 2025-12-04T11:11:26.3399220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3399678Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3400111Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3401196Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3402010Z graph_break [] 2025-12-04T11:11:26.3402379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3403452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3404406Z warnings.warn( 2025-12-04T11:11:26.3405261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3406197Z warnings.warn( 2025-12-04T11:11:26.3407181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml - 2025-12-04T11:11:26.3408305Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3409513Z FAILED [0.3998s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3410554Z 2025-12-04T11:11:26.3410656Z Expected 1 but got 2. 2025-12-04T11:11:26.3410940Z Absolute difference: 1 2025-12-04T11:11:26.3411230Z Relative difference: 1.0 2025-12-04T11:11:26.3411416Z 2025-12-04T11:11:26.3411622Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3413021Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3414033Z 2025-12-04T11:11:26.3414307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3414890Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3415436Z ================== 1 failed, 10 deselected, 2 rerun in 20.13s ================== 2025-12-04T11:11:26.3415873Z Got exit code 1 2025-12-04T11:11:26.3416132Z Retrying single test... 2025-12-04T11:11:26.3416734Z W1204 10:57:50.675000 88942 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3417991Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml 2025-12-04T11:11:26.3418933Z ============================= test session starts ============================== 2025-12-04T11:11:26.3419590Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3420168Z cachedir: .pytest_cache 2025-12-04T11:11:26.3420855Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3421622Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3421951Z configfile: pytest.ini 2025-12-04T11:11:26.3422658Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3423538Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3424845Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3426029Z Running 1 items in this shard 2025-12-04T11:11:26.3426244Z 2025-12-04T11:11:26.3427490Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:54.760978103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3428877Z 2025-12-04T11:11:26.3429384Z [W1204 10:58:09.832971750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3430031Z 2025-12-04T11:11:26.3430533Z [W1204 10:58:09.833218583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3431172Z 2025-12-04T11:11:26.3431687Z [W1204 10:58:09.840333771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3432325Z 2025-12-04T11:11:26.3432842Z [W1204 10:58:09.841000977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3433477Z 2025-12-04T11:11:26.3433973Z [W1204 10:58:09.841182409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3434618Z 2025-12-04T11:11:26.3435118Z [W1204 10:58:09.847813932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3435763Z 2025-12-04T11:11:26.3436263Z [W1204 10:58:09.848412743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3436893Z 2025-12-04T11:11:26.3437405Z [W1204 10:58:09.848591222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3438035Z 2025-12-04T11:11:26.3438615Z [W1204 10:58:11.787489885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3439247Z 2025-12-04T11:11:26.3439745Z [W1204 10:58:11.789188000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3440417Z 2025-12-04T11:11:26.3440914Z [W1204 10:58:11.789390463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3441622Z 2025-12-04T11:11:26.3442127Z [W1204 10:58:11.793245084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3442801Z 2025-12-04T11:11:26.3443316Z [W1204 10:58:11.793877334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3443950Z 2025-12-04T11:11:26.3444465Z [W1204 10:58:11.794067164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3445101Z 2025-12-04T11:11:26.3445600Z [W1204 10:58:11.799941951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3446245Z 2025-12-04T11:11:26.3446752Z [W1204 10:58:11.800589831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3447400Z 2025-12-04T11:11:26.3447901Z [W1204 10:58:11.800781886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3448541Z 2025-12-04T11:11:26.3448688Z ('RERUN', {'yellow': True}) [18.7960s] [100%] 2025-12-04T11:11:26.3450188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:58:11.157128943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3451558Z 2025-12-04T11:11:26.3452065Z [W1204 10:58:11.157852060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3452717Z 2025-12-04T11:11:26.3453226Z [W1204 10:58:11.158043979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3453880Z 2025-12-04T11:11:26.3454380Z [W1204 10:58:11.161864609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3455014Z 2025-12-04T11:11:26.3455533Z [W1204 10:58:11.162625941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3456162Z 2025-12-04T11:11:26.3456679Z [W1204 10:58:11.162813994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3457310Z 2025-12-04T11:11:26.3457813Z [W1204 10:58:11.168628944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3458459Z 2025-12-04T11:11:26.3458963Z [W1204 10:58:11.169218181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3459612Z 2025-12-04T11:11:26.3460116Z [W1204 10:58:11.169398528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3460764Z 2025-12-04T11:11:26.3461267Z [W1204 10:58:11.252434379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3461900Z 2025-12-04T11:11:26.3462411Z [W1204 10:58:11.253144550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3463042Z 2025-12-04T11:11:26.3463626Z [W1204 10:58:11.253341009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3464261Z 2025-12-04T11:11:26.3464759Z [W1204 10:58:11.257092232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3465437Z 2025-12-04T11:11:26.3465936Z [W1204 10:58:11.257696857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3466603Z 2025-12-04T11:11:26.3467104Z [W1204 10:58:11.257886961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3467734Z 2025-12-04T11:11:26.3468246Z [W1204 10:58:11.263746274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3468883Z 2025-12-04T11:11:26.3469398Z [W1204 10:58:11.264530455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3470031Z 2025-12-04T11:11:26.3470530Z [W1204 10:58:11.264719359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3471176Z 2025-12-04T11:11:26.3471302Z ('RERUN', {'yellow': True}) [0.4258s] [100%] 2025-12-04T11:11:26.3472793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:58:11.559615951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3474154Z 2025-12-04T11:11:26.3474672Z [W1204 10:58:11.560383229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3475305Z 2025-12-04T11:11:26.3475821Z [W1204 10:58:11.560583110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3476451Z 2025-12-04T11:11:26.3476950Z [W1204 10:58:11.564350986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3477595Z 2025-12-04T11:11:26.3478096Z [W1204 10:58:11.565090353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3478746Z 2025-12-04T11:11:26.3479246Z [W1204 10:58:11.565275565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3479880Z 2025-12-04T11:11:26.3480395Z [W1204 10:58:11.571094444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3481028Z 2025-12-04T11:11:26.3481608Z [W1204 10:58:11.571679492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3482240Z 2025-12-04T11:11:26.3482742Z [W1204 10:58:11.571859442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3483391Z 2025-12-04T11:11:26.3483892Z [W1204 10:58:12.655516158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3484535Z 2025-12-04T11:11:26.3485036Z [W1204 10:58:12.656232026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3485671Z 2025-12-04T11:11:26.3486184Z [W1204 10:58:12.656425076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3486815Z 2025-12-04T11:11:26.3487434Z [W1204 10:58:12.660194340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3488068Z 2025-12-04T11:11:26.3488568Z [W1204 10:58:12.660801764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3489251Z 2025-12-04T11:11:26.3489752Z [W1204 10:58:12.660989668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3490397Z 2025-12-04T11:11:26.3490901Z [W1204 10:58:12.666791750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3491584Z 2025-12-04T11:11:26.3492083Z [W1204 10:58:12.667550030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3492716Z 2025-12-04T11:11:26.3493233Z [W1204 10:58:12.667737671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3493865Z 2025-12-04T11:11:26.3493964Z FAILED [0.4002s] [100%] 2025-12-04T11:11:26.3494154Z 2025-12-04T11:11:26.3494297Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3495080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3495833Z Traceback (most recent call last): 2025-12-04T11:11:26.3496556Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3497414Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3498228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3498975Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3499780Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3500645Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3501352Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3501602Z 2025-12-04T11:11:26.3501706Z Expected 1 but got 2. 2025-12-04T11:11:26.3501997Z Absolute difference: 1 2025-12-04T11:11:26.3502293Z Relative difference: 1.0 2025-12-04T11:11:26.3502478Z 2025-12-04T11:11:26.3502701Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3503922Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3504947Z 2025-12-04T11:11:26.3505209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3505829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3506299Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3507018Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3507893Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3508351Z graph_break [] 2025-12-04T11:11:26.3508709Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3510236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3511659Z if out == self.unknown_value: 2025-12-04T11:11:26.3512582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3513691Z warnings.warn( 2025-12-04T11:11:26.3514562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3515558Z warnings.warn( 2025-12-04T11:11:26.3516211Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3516948Z Traceback (most recent call last): 2025-12-04T11:11:26.3517730Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3518591Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3519399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3520137Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3520959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3521901Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3522362Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3522619Z 2025-12-04T11:11:26.3522723Z Expected 1 but got 2. 2025-12-04T11:11:26.3523008Z Absolute difference: 1 2025-12-04T11:11:26.3523286Z Relative difference: 1.0 2025-12-04T11:11:26.3523486Z 2025-12-04T11:11:26.3523700Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3524925Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3525935Z 2025-12-04T11:11:26.3526207Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3526830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3527286Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3528015Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3528905Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3529352Z graph_break [] 2025-12-04T11:11:26.3529722Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3531271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3532703Z if out == self.unknown_value: 2025-12-04T11:11:26.3533625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3534575Z warnings.warn( 2025-12-04T11:11:26.3535448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3536399Z warnings.warn( 2025-12-04T11:11:26.3536760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3537236Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3537676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3538537Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3539451Z graph_break [] 2025-12-04T11:11:26.3539833Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3541090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3542123Z warnings.warn( 2025-12-04T11:11:26.3542998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3543990Z warnings.warn( 2025-12-04T11:11:26.3544284Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3545105Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.3545856Z Traceback (most recent call last): 2025-12-04T11:11:26.3546587Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3547436Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3548248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3548988Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3549805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3550664Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3551135Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3551379Z 2025-12-04T11:11:26.3551496Z Expected 1 but got 2. 2025-12-04T11:11:26.3551768Z Absolute difference: 1 2025-12-04T11:11:26.3552057Z Relative difference: 1.0 2025-12-04T11:11:26.3552242Z 2025-12-04T11:11:26.3552465Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3553696Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3554713Z 2025-12-04T11:11:26.3554974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3555596Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3556076Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3556814Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3557681Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3558149Z graph_break [] 2025-12-04T11:11:26.3558511Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3560041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3561550Z if out == self.unknown_value: 2025-12-04T11:11:26.3562483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3563444Z warnings.warn( 2025-12-04T11:11:26.3564352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3565302Z warnings.warn( 2025-12-04T11:11:26.3565680Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3566151Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3566575Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3567454Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3568315Z graph_break [] 2025-12-04T11:11:26.3568673Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3569402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3569535Z warnings.warn( 2025-12-04T11:11:26.3570265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3570423Z warnings.warn( 2025-12-04T11:11:26.3570639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3570768Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3570996Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3571519Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3571630Z graph_break [] 2025-12-04T11:11:26.3571845Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3572580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3572678Z warnings.warn( 2025-12-04T11:11:26.3573392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3573513Z warnings.warn( 2025-12-04T11:11:26.3574341Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml - 2025-12-04T11:11:26.3574525Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3575448Z FAILED [0.4002s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3575456Z 2025-12-04T11:11:26.3575564Z Expected 1 but got 2. 2025-12-04T11:11:26.3575682Z Absolute difference: 1 2025-12-04T11:11:26.3575795Z Relative difference: 1.0 2025-12-04T11:11:26.3575800Z 2025-12-04T11:11:26.3576028Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3576912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3576917Z 2025-12-04T11:11:26.3577181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3577377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3577571Z ================== 1 failed, 10 deselected, 2 rerun in 19.65s ================== 2025-12-04T11:11:26.3577680Z Got exit code 1 2025-12-04T11:11:26.3578479Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.3578883Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.3579336Z W1204 10:58:23.242000 89116 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3579983Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml 2025-12-04T11:11:26.3580161Z ============================= test session starts ============================== 2025-12-04T11:11:26.3580572Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3580683Z cachedir: .pytest_cache 2025-12-04T11:11:26.3581203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3581351Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3581455Z configfile: pytest.ini 2025-12-04T11:11:26.3582002Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3582238Z collecting ... collected 58 items / 2 deselected / 56 selected 2025-12-04T11:11:26.3582388Z stepcurrent: skipping 2 already run items. 2025-12-04T11:11:26.3582498Z Running 9 items in this shard 2025-12-04T11:11:26.3582503Z 2025-12-04T11:11:26.3583360Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7787s] [ 11%] 2025-12-04T11:11:26.3584215Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4661s] [ 11%] 2025-12-04T11:11:26.3584975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4632s] [ 11%] 2025-12-04T11:11:26.3584981Z 2025-12-04T11:11:26.3585131Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3585624Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3585744Z Traceback (most recent call last): 2025-12-04T11:11:26.3586258Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3586491Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3586959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3587118Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3587647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3587862Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3587992Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3587998Z 2025-12-04T11:11:26.3588124Z Expected 1 but got 2. 2025-12-04T11:11:26.3588231Z Absolute difference: 1 2025-12-04T11:11:26.3588340Z Relative difference: 1.0 2025-12-04T11:11:26.3588345Z 2025-12-04T11:11:26.3588571Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3589459Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3589464Z 2025-12-04T11:11:26.3589744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3589964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3590078Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3590611Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3590834Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3590930Z graph_break [] 2025-12-04T11:11:26.3591156Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3591933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3592046Z warnings.warn( 2025-12-04T11:11:26.3592753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3592885Z warnings.warn( 2025-12-04T11:11:26.3593395Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3593512Z Traceback (most recent call last): 2025-12-04T11:11:26.3594040Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3594279Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3594728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3594898Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3595426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3595628Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3595774Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3595779Z 2025-12-04T11:11:26.3595881Z Expected 1 but got 2. 2025-12-04T11:11:26.3596000Z Absolute difference: 1 2025-12-04T11:11:26.3596108Z Relative difference: 1.0 2025-12-04T11:11:26.3596112Z 2025-12-04T11:11:26.3596326Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3597229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3597234Z 2025-12-04T11:11:26.3597499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3597732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3597847Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3598368Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3598608Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3598705Z graph_break [] 2025-12-04T11:11:26.3598917Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3599651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3599749Z warnings.warn( 2025-12-04T11:11:26.3600468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3600570Z warnings.warn( 2025-12-04T11:11:26.3600784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3601118Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3601340Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3601922Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3602038Z graph_break [] 2025-12-04T11:11:26.3602253Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3602977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3603077Z warnings.warn( 2025-12-04T11:11:26.3603910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3604025Z warnings.warn( 2025-12-04T11:11:26.3604168Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3604746Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3604867Z Traceback (most recent call last): 2025-12-04T11:11:26.3605365Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3605645Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3606093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3606254Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3606801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3607002Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3607147Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3607153Z 2025-12-04T11:11:26.3607257Z Expected 1 but got 2. 2025-12-04T11:11:26.3607361Z Absolute difference: 1 2025-12-04T11:11:26.3607483Z Relative difference: 1.0 2025-12-04T11:11:26.3607488Z 2025-12-04T11:11:26.3607700Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3608579Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3608600Z 2025-12-04T11:11:26.3608864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3609082Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3609216Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3609737Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3609964Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3610076Z graph_break [] 2025-12-04T11:11:26.3610287Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3611019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3611118Z warnings.warn( 2025-12-04T11:11:26.3611828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3611937Z warnings.warn( 2025-12-04T11:11:26.3612152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3612264Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3612494Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3613010Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3613119Z graph_break [] 2025-12-04T11:11:26.3613326Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3614038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3614147Z warnings.warn( 2025-12-04T11:11:26.3614850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3614961Z warnings.warn( 2025-12-04T11:11:26.3615234Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3615346Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3615580Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3616143Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3616237Z graph_break [] 2025-12-04T11:11:26.3616461Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3617199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3617309Z warnings.warn( 2025-12-04T11:11:26.3618012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3618115Z warnings.warn( 2025-12-04T11:11:26.3618952Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml - 2025-12-04T11:11:26.3619125Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3620053Z FAILED [0.4632s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3620061Z 2025-12-04T11:11:26.3620167Z Expected 1 but got 2. 2025-12-04T11:11:26.3620271Z Absolute difference: 1 2025-12-04T11:11:26.3620388Z Relative difference: 1.0 2025-12-04T11:11:26.3620393Z 2025-12-04T11:11:26.3620608Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3621507Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3621512Z 2025-12-04T11:11:26.3621773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3621951Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3622153Z =================== 1 failed, 2 deselected, 2 rerun in 4.74s =================== 2025-12-04T11:11:26.3622249Z Got exit code 1 2025-12-04T11:11:26.3622356Z Retrying single test... 2025-12-04T11:11:26.3622802Z W1204 10:58:42.773000 89292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3623444Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml 2025-12-04T11:11:26.3623616Z ============================= test session starts ============================== 2025-12-04T11:11:26.3623960Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3624068Z cachedir: .pytest_cache 2025-12-04T11:11:26.3624589Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3624710Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3624829Z configfile: pytest.ini 2025-12-04T11:11:26.3625357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3625573Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3626546Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3626727Z Running 1 items in this shard 2025-12-04T11:11:26.3626733Z 2025-12-04T11:11:26.3627997Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:58:46.934870676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3628035Z 2025-12-04T11:11:26.3628545Z [W1204 10:59:01.057249516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3628583Z 2025-12-04T11:11:26.3629099Z [W1204 10:59:01.057500915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3629105Z 2025-12-04T11:11:26.3629603Z [W1204 10:59:01.064606512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3629608Z 2025-12-04T11:11:26.3630111Z [W1204 10:59:01.065260370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3630129Z 2025-12-04T11:11:26.3630624Z [W1204 10:59:01.065443341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3630631Z 2025-12-04T11:11:26.3631126Z [W1204 10:59:01.072107141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3631133Z 2025-12-04T11:11:26.3631648Z [W1204 10:59:01.072730775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3631653Z 2025-12-04T11:11:26.3632147Z [W1204 10:59:01.072912663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3632152Z 2025-12-04T11:11:26.3632665Z [W1204 10:59:03.015186102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3632670Z 2025-12-04T11:11:26.3633168Z [W1204 10:59:03.017137211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3633175Z 2025-12-04T11:11:26.3633685Z [W1204 10:59:03.017338573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3633691Z 2025-12-04T11:11:26.3634187Z [W1204 10:59:03.021211821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3634192Z 2025-12-04T11:11:26.3634685Z [W1204 10:59:03.021840383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3634702Z 2025-12-04T11:11:26.3635202Z [W1204 10:59:03.022030889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3635207Z 2025-12-04T11:11:26.3635705Z [W1204 10:59:03.027927001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3635712Z 2025-12-04T11:11:26.3636220Z [W1204 10:59:03.028534605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3636224Z 2025-12-04T11:11:26.3636723Z [W1204 10:59:03.028721918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3636727Z 2025-12-04T11:11:26.3636868Z ('RERUN', {'yellow': True}) [18.9130s] [100%] 2025-12-04T11:11:26.3638176Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:03.445813069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3638182Z 2025-12-04T11:11:26.3638697Z [W1204 10:59:03.446587586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3638732Z 2025-12-04T11:11:26.3639232Z [W1204 10:59:03.446786996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3639236Z 2025-12-04T11:11:26.3639777Z [W1204 10:59:03.450665381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3639782Z 2025-12-04T11:11:26.3640277Z [W1204 10:59:03.451466801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3640282Z 2025-12-04T11:11:26.3640783Z [W1204 10:59:03.451656784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3640800Z 2025-12-04T11:11:26.3641295Z [W1204 10:59:03.457536331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3641302Z 2025-12-04T11:11:26.3641894Z [W1204 10:59:03.458156152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3641900Z 2025-12-04T11:11:26.3642414Z [W1204 10:59:03.458338979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3642422Z 2025-12-04T11:11:26.3642917Z [W1204 10:59:03.542263457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3642922Z 2025-12-04T11:11:26.3643436Z [W1204 10:59:03.543020452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3643442Z 2025-12-04T11:11:26.3643940Z [W1204 10:59:03.543224932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3643947Z 2025-12-04T11:11:26.3644460Z [W1204 10:59:03.547039609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3644464Z 2025-12-04T11:11:26.3644962Z [W1204 10:59:03.547649020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3644969Z 2025-12-04T11:11:26.3645480Z [W1204 10:59:03.547837223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3645484Z 2025-12-04T11:11:26.3645981Z [W1204 10:59:03.553760295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3645989Z 2025-12-04T11:11:26.3646486Z [W1204 10:59:03.554557388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3646507Z 2025-12-04T11:11:26.3647002Z [W1204 10:59:03.554745043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3647007Z 2025-12-04T11:11:26.3647131Z ('RERUN', {'yellow': True}) [0.4879s] [100%] 2025-12-04T11:11:26.3648391Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:04.914874047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3648397Z 2025-12-04T11:11:26.3648963Z [W1204 10:59:04.915594534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3648968Z 2025-12-04T11:11:26.3649481Z [W1204 10:59:04.915786930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3649515Z 2025-12-04T11:11:26.3650008Z [W1204 10:59:04.919600185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3650013Z 2025-12-04T11:11:26.3650518Z [W1204 10:59:04.920379329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3650551Z 2025-12-04T11:11:26.3651048Z [W1204 10:59:04.920570726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3651053Z 2025-12-04T11:11:26.3651548Z [W1204 10:59:04.926408373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3651571Z 2025-12-04T11:11:26.3652065Z [W1204 10:59:04.927005548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3652069Z 2025-12-04T11:11:26.3652570Z [W1204 10:59:04.927187819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3652575Z 2025-12-04T11:11:26.3653081Z [W1204 10:59:04.012543738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3653088Z 2025-12-04T11:11:26.3653582Z [W1204 10:59:04.013309458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3653587Z 2025-12-04T11:11:26.3654096Z [W1204 10:59:04.013518408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3654101Z 2025-12-04T11:11:26.3654599Z [W1204 10:59:04.017365067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3654604Z 2025-12-04T11:11:26.3655112Z [W1204 10:59:04.017986550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3655119Z 2025-12-04T11:11:26.3655619Z [W1204 10:59:04.018187722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3655625Z 2025-12-04T11:11:26.3656133Z [W1204 10:59:04.024084465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3656139Z 2025-12-04T11:11:26.3656683Z [W1204 10:59:04.024871088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3656688Z 2025-12-04T11:11:26.3657189Z [W1204 10:59:04.025058807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3657206Z 2025-12-04T11:11:26.3657306Z FAILED [0.4681s] [100%] 2025-12-04T11:11:26.3657313Z 2025-12-04T11:11:26.3657453Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3657963Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3658085Z Traceback (most recent call last): 2025-12-04T11:11:26.3658584Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3658824Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3659276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3659531Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3660058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3660259Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3660427Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3660432Z 2025-12-04T11:11:26.3660536Z Expected 1 but got 2. 2025-12-04T11:11:26.3660644Z Absolute difference: 1 2025-12-04T11:11:26.3660763Z Relative difference: 1.0 2025-12-04T11:11:26.3660796Z 2025-12-04T11:11:26.3661008Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3661914Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3661919Z 2025-12-04T11:11:26.3662178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3662397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3662525Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3663042Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3663280Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3663379Z graph_break [] 2025-12-04T11:11:26.3663589Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3664783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3664896Z if out == self.unknown_value: 2025-12-04T11:11:26.3665624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3665723Z warnings.warn( 2025-12-04T11:11:26.3666426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3666541Z warnings.warn( 2025-12-04T11:11:26.3667039Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3667188Z Traceback (most recent call last): 2025-12-04T11:11:26.3667685Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3667913Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3668375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3668542Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3669064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3669282Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3669411Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3669416Z 2025-12-04T11:11:26.3669532Z Expected 1 but got 2. 2025-12-04T11:11:26.3669638Z Absolute difference: 1 2025-12-04T11:11:26.3669749Z Relative difference: 1.0 2025-12-04T11:11:26.3669754Z 2025-12-04T11:11:26.3669978Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3670866Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3670872Z 2025-12-04T11:11:26.3671216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3671431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3671541Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3672103Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3672327Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3672427Z graph_break [] 2025-12-04T11:11:26.3672683Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3673874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3674005Z if out == self.unknown_value: 2025-12-04T11:11:26.3674724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3674823Z warnings.warn( 2025-12-04T11:11:26.3675546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3675646Z warnings.warn( 2025-12-04T11:11:26.3675874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3675991Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3676216Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3676742Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3676840Z graph_break [] 2025-12-04T11:11:26.3677054Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3677774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3677873Z warnings.warn( 2025-12-04T11:11:26.3678586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3678682Z warnings.warn( 2025-12-04T11:11:26.3678829Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3679342Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3679462Z Traceback (most recent call last): 2025-12-04T11:11:26.3679974Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3680201Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3680649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3680827Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3681354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3681619Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3681766Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3681771Z 2025-12-04T11:11:26.3681872Z Expected 1 but got 2. 2025-12-04T11:11:26.3681990Z Absolute difference: 1 2025-12-04T11:11:26.3682096Z Relative difference: 1.0 2025-12-04T11:11:26.3682101Z 2025-12-04T11:11:26.3682312Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3683287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3683293Z 2025-12-04T11:11:26.3683556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3683811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3683925Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3684444Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3684710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3684804Z graph_break [] 2025-12-04T11:11:26.3685016Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3686219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3686337Z if out == self.unknown_value: 2025-12-04T11:11:26.3687058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3687161Z warnings.warn( 2025-12-04T11:11:26.3687867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3687982Z warnings.warn( 2025-12-04T11:11:26.3688200Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3688326Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3688548Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3689065Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3689175Z graph_break [] 2025-12-04T11:11:26.3689384Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3690093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3690206Z warnings.warn( 2025-12-04T11:11:26.3690906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3691015Z warnings.warn( 2025-12-04T11:11:26.3691224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3691338Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3691571Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3692091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3692201Z graph_break [] 2025-12-04T11:11:26.3692411Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3693116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3693226Z warnings.warn( 2025-12-04T11:11:26.3693928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3694026Z warnings.warn( 2025-12-04T11:11:26.3694857Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml - 2025-12-04T11:11:26.3695086Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3696018Z FAILED [0.4681s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3696053Z 2025-12-04T11:11:26.3696156Z Expected 1 but got 2. 2025-12-04T11:11:26.3696260Z Absolute difference: 1 2025-12-04T11:11:26.3696380Z Relative difference: 1.0 2025-12-04T11:11:26.3696416Z 2025-12-04T11:11:26.3696632Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3697538Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3697543Z 2025-12-04T11:11:26.3697813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3697990Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3698200Z ================== 1 failed, 10 deselected, 2 rerun in 19.90s ================== 2025-12-04T11:11:26.3698299Z Got exit code 1 2025-12-04T11:11:26.3698418Z Retrying single test... 2025-12-04T11:11:26.3698854Z W1204 10:59:15.431000 89473 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3699494Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml 2025-12-04T11:11:26.3699672Z ============================= test session starts ============================== 2025-12-04T11:11:26.3700015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3700124Z cachedir: .pytest_cache 2025-12-04T11:11:26.3700645Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3700766Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3701054Z configfile: pytest.ini 2025-12-04T11:11:26.3701587Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3701802Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3702787Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3702903Z Running 1 items in this shard 2025-12-04T11:11:26.3702908Z 2025-12-04T11:11:26.3704177Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:18.597354727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3704183Z 2025-12-04T11:11:26.3704691Z [W1204 10:59:34.254683869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3704698Z 2025-12-04T11:11:26.3705219Z [W1204 10:59:34.254938694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3705226Z 2025-12-04T11:11:26.3705726Z [W1204 10:59:34.262074058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3705731Z 2025-12-04T11:11:26.3706246Z [W1204 10:59:34.262766226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3706251Z 2025-12-04T11:11:26.3706924Z [W1204 10:59:34.262949376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3706929Z 2025-12-04T11:11:26.3707430Z [W1204 10:59:34.269618954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3707475Z 2025-12-04T11:11:26.3707991Z [W1204 10:59:34.270251613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3707996Z 2025-12-04T11:11:26.3708546Z [W1204 10:59:34.270434854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3708550Z 2025-12-04T11:11:26.3709064Z [W1204 10:59:36.209662939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3709069Z 2025-12-04T11:11:26.3709574Z [W1204 10:59:36.211414846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3709579Z 2025-12-04T11:11:26.3710091Z [W1204 10:59:36.211622905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3710099Z 2025-12-04T11:11:26.3710595Z [W1204 10:59:36.215549934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3710599Z 2025-12-04T11:11:26.3711109Z [W1204 10:59:36.216197946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3711116Z 2025-12-04T11:11:26.3711610Z [W1204 10:59:36.216388543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3711615Z 2025-12-04T11:11:26.3712113Z [W1204 10:59:36.222483482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3712132Z 2025-12-04T11:11:26.3712626Z [W1204 10:59:36.223151602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3712633Z 2025-12-04T11:11:26.3713129Z [W1204 10:59:36.223339449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3713133Z 2025-12-04T11:11:26.3713277Z ('RERUN', {'yellow': True}) [19.4599s] [100%] 2025-12-04T11:11:26.3714528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:37.645191035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3714533Z 2025-12-04T11:11:26.3715049Z [W1204 10:59:37.645959842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3715054Z 2025-12-04T11:11:26.3715551Z [W1204 10:59:37.646169365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3715558Z 2025-12-04T11:11:26.3716070Z [W1204 10:59:37.650069536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3716074Z 2025-12-04T11:11:26.3716574Z [W1204 10:59:37.650862953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3716581Z 2025-12-04T11:11:26.3717088Z [W1204 10:59:37.651047734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3717092Z 2025-12-04T11:11:26.3717654Z [W1204 10:59:37.656932654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3717658Z 2025-12-04T11:11:26.3718156Z [W1204 10:59:37.657533138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3718199Z 2025-12-04T11:11:26.3718699Z [W1204 10:59:37.657713579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3718704Z 2025-12-04T11:11:26.3719200Z [W1204 10:59:37.742076908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3719234Z 2025-12-04T11:11:26.3719746Z [W1204 10:59:37.742823260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3719751Z 2025-12-04T11:11:26.3720246Z [W1204 10:59:37.743018813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3720255Z 2025-12-04T11:11:26.3720766Z [W1204 10:59:37.746849396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3720770Z 2025-12-04T11:11:26.3721267Z [W1204 10:59:37.747460899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3721272Z 2025-12-04T11:11:26.3721852Z [W1204 10:59:37.747648351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3721865Z 2025-12-04T11:11:26.3722358Z [W1204 10:59:37.753588698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3722362Z 2025-12-04T11:11:26.3722858Z [W1204 10:59:37.754408041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3722876Z 2025-12-04T11:11:26.3723375Z [W1204 10:59:37.754594689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3723380Z 2025-12-04T11:11:26.3723506Z ('RERUN', {'yellow': True}) [0.4918s] [100%] 2025-12-04T11:11:26.3724765Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:37.112188009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3724771Z 2025-12-04T11:11:26.3725267Z [W1204 10:59:37.112926268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3725271Z 2025-12-04T11:11:26.3725783Z [W1204 10:59:37.113122062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3725788Z 2025-12-04T11:11:26.3726285Z [W1204 10:59:37.116958308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3726289Z 2025-12-04T11:11:26.3726800Z [W1204 10:59:37.117734360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3726807Z 2025-12-04T11:11:26.3727305Z [W1204 10:59:37.117923149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3727311Z 2025-12-04T11:11:26.3727823Z [W1204 10:59:37.123862035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3727827Z 2025-12-04T11:11:26.3728323Z [W1204 10:59:37.124498010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3728328Z 2025-12-04T11:11:26.3728893Z [W1204 10:59:37.124681070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3728913Z 2025-12-04T11:11:26.3729411Z [W1204 10:59:37.209717231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3729442Z 2025-12-04T11:11:26.3729941Z [W1204 10:59:37.210508883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3729973Z 2025-12-04T11:11:26.3730486Z [W1204 10:59:37.210714155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3730491Z 2025-12-04T11:11:26.3730990Z [W1204 10:59:37.214548285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3730995Z 2025-12-04T11:11:26.3731509Z [W1204 10:59:37.215170892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3731514Z 2025-12-04T11:11:26.3732010Z [W1204 10:59:37.215359696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3732017Z 2025-12-04T11:11:26.3732529Z [W1204 10:59:37.221283010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3732535Z 2025-12-04T11:11:26.3733032Z [W1204 10:59:37.222080794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3733037Z 2025-12-04T11:11:26.3733547Z [W1204 10:59:37.222278694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3733552Z 2025-12-04T11:11:26.3733650Z FAILED [0.4656s] [100%] 2025-12-04T11:11:26.3733655Z 2025-12-04T11:11:26.3733798Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3734305Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3734426Z Traceback (most recent call last): 2025-12-04T11:11:26.3734939Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3735170Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3735623Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3735798Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3736322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3736526Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3736668Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3736673Z 2025-12-04T11:11:26.3736777Z Expected 1 but got 2. 2025-12-04T11:11:26.3736896Z Absolute difference: 1 2025-12-04T11:11:26.3737005Z Relative difference: 1.0 2025-12-04T11:11:26.3737009Z 2025-12-04T11:11:26.3737220Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3738119Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3738127Z 2025-12-04T11:11:26.3738387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3738618Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3738731Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3739314Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3739553Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3739656Z graph_break [] 2025-12-04T11:11:26.3739898Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3741096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3741257Z if out == self.unknown_value: 2025-12-04T11:11:26.3741987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3742090Z warnings.warn( 2025-12-04T11:11:26.3742801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3742915Z warnings.warn( 2025-12-04T11:11:26.3743413Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3743555Z Traceback (most recent call last): 2025-12-04T11:11:26.3744056Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3744286Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3744747Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3744911Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3745434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3745654Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3745785Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3745790Z 2025-12-04T11:11:26.3745910Z Expected 1 but got 2. 2025-12-04T11:11:26.3746018Z Absolute difference: 1 2025-12-04T11:11:26.3746128Z Relative difference: 1.0 2025-12-04T11:11:26.3746132Z 2025-12-04T11:11:26.3746359Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3747245Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3747252Z 2025-12-04T11:11:26.3747531Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3747747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3747861Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3748397Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3748620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3748730Z graph_break [] 2025-12-04T11:11:26.3748942Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3750120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3750246Z if out == self.unknown_value: 2025-12-04T11:11:26.3750957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3751056Z warnings.warn( 2025-12-04T11:11:26.3751835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3751937Z warnings.warn( 2025-12-04T11:11:26.3752164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3752306Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3752529Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3753061Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3753184Z graph_break [] 2025-12-04T11:11:26.3753404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3754115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3754218Z warnings.warn( 2025-12-04T11:11:26.3754940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3755039Z warnings.warn( 2025-12-04T11:11:26.3755178Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3755690Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3755810Z Traceback (most recent call last): 2025-12-04T11:11:26.3756318Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3756543Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3756990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3757167Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3757691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3757906Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3758035Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3758040Z 2025-12-04T11:11:26.3758142Z Expected 1 but got 2. 2025-12-04T11:11:26.3758261Z Absolute difference: 1 2025-12-04T11:11:26.3758368Z Relative difference: 1.0 2025-12-04T11:11:26.3758375Z 2025-12-04T11:11:26.3758581Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3759480Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3759485Z 2025-12-04T11:11:26.3759755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3759978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3760091Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3760608Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3760841Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3760937Z graph_break [] 2025-12-04T11:11:26.3761162Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3762408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3762527Z if out == self.unknown_value: 2025-12-04T11:11:26.3763322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3763423Z warnings.warn( 2025-12-04T11:11:26.3764140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3764270Z warnings.warn( 2025-12-04T11:11:26.3764531Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3764686Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3764908Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3765422Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3765530Z graph_break [] 2025-12-04T11:11:26.3765745Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3766471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3766572Z warnings.warn( 2025-12-04T11:11:26.3767276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3767387Z warnings.warn( 2025-12-04T11:11:26.3767601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3767713Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3767945Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3768458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3768565Z graph_break [] 2025-12-04T11:11:26.3768779Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3769487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3769601Z warnings.warn( 2025-12-04T11:11:26.3770301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3770414Z warnings.warn( 2025-12-04T11:11:26.3771234Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml - 2025-12-04T11:11:26.3771401Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3772337Z FAILED [0.4656s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3772343Z 2025-12-04T11:11:26.3772445Z Expected 1 but got 2. 2025-12-04T11:11:26.3772561Z Absolute difference: 1 2025-12-04T11:11:26.3772671Z Relative difference: 1.0 2025-12-04T11:11:26.3772676Z 2025-12-04T11:11:26.3772887Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3773782Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3773789Z 2025-12-04T11:11:26.3774050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3774242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3774435Z ================== 1 failed, 10 deselected, 2 rerun in 20.45s ================== 2025-12-04T11:11:26.3774606Z Got exit code 1 2025-12-04T11:11:26.3775422Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3775857Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.3776289Z W1204 10:59:48.576000 89654 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3776979Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml 2025-12-04T11:11:26.3777141Z ============================= test session starts ============================== 2025-12-04T11:11:26.3777495Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3777602Z cachedir: .pytest_cache 2025-12-04T11:11:26.3778117Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3778252Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3778357Z configfile: pytest.ini 2025-12-04T11:11:26.3778900Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3779109Z collecting ... collected 58 items / 3 deselected / 55 selected 2025-12-04T11:11:26.3779248Z stepcurrent: skipping 3 already run items. 2025-12-04T11:11:26.3779374Z Running 8 items in this shard 2025-12-04T11:11:26.3779380Z 2025-12-04T11:11:26.3780229Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8029s] [ 12%] 2025-12-04T11:11:26.3781085Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4767s] [ 12%] 2025-12-04T11:11:26.3781838Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4701s] [ 12%] 2025-12-04T11:11:26.3781846Z 2025-12-04T11:11:26.3781981Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3782487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3782608Z Traceback (most recent call last): 2025-12-04T11:11:26.3783119Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3783343Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3783796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3783969Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3784496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3784699Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3784842Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3784847Z 2025-12-04T11:11:26.3784951Z Expected 1 but got 2. 2025-12-04T11:11:26.3785067Z Absolute difference: 1 2025-12-04T11:11:26.3785171Z Relative difference: 1.0 2025-12-04T11:11:26.3785175Z 2025-12-04T11:11:26.3785385Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3786286Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3786357Z 2025-12-04T11:11:26.3786615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3786840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3787019Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3787536Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3787774Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3787899Z graph_break [] 2025-12-04T11:11:26.3788112Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3788842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3788939Z warnings.warn( 2025-12-04T11:11:26.3789667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3789767Z warnings.warn( 2025-12-04T11:11:26.3790263Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3790400Z Traceback (most recent call last): 2025-12-04T11:11:26.3790895Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3791138Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3791588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3791750Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3792292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3792500Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3792628Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3792648Z 2025-12-04T11:11:26.3792754Z Expected 1 but got 2. 2025-12-04T11:11:26.3792862Z Absolute difference: 1 2025-12-04T11:11:26.3792984Z Relative difference: 1.0 2025-12-04T11:11:26.3792989Z 2025-12-04T11:11:26.3793201Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3794092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3794099Z 2025-12-04T11:11:26.3794375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3794589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3794721Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3795242Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3795467Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3795581Z graph_break [] 2025-12-04T11:11:26.3795794Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3796523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3796622Z warnings.warn( 2025-12-04T11:11:26.3797330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3797439Z warnings.warn( 2025-12-04T11:11:26.3797653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3797826Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3798067Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3798583Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3798724Z graph_break [] 2025-12-04T11:11:26.3798934Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3799997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3800188Z warnings.warn( 2025-12-04T11:11:26.3801212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3801345Z warnings.warn( 2025-12-04T11:11:26.3801596Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3802164Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3802381Z Traceback (most recent call last): 2025-12-04T11:11:26.3802986Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3803260Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3803758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3804001Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3804538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3804879Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3805060Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3805065Z 2025-12-04T11:11:26.3805204Z Expected 1 but got 2. 2025-12-04T11:11:26.3805392Z Absolute difference: 1 2025-12-04T11:11:26.3805537Z Relative difference: 1.0 2025-12-04T11:11:26.3805548Z 2025-12-04T11:11:26.3805772Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3806796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3806804Z 2025-12-04T11:11:26.3807112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3807411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3807560Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3808115Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3808434Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3808597Z graph_break [] 2025-12-04T11:11:26.3808900Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3809650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3809787Z warnings.warn( 2025-12-04T11:11:26.3810547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3810714Z warnings.warn( 2025-12-04T11:11:26.3811037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3811184Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3811582Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3812180Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3812347Z graph_break [] 2025-12-04T11:11:26.3812631Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3813439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3813617Z warnings.warn( 2025-12-04T11:11:26.3814398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3814542Z warnings.warn( 2025-12-04T11:11:26.3814767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3815020Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3815281Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3815876Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3816018Z graph_break [] 2025-12-04T11:11:26.3816262Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3817063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3817217Z warnings.warn( 2025-12-04T11:11:26.3817956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3818133Z warnings.warn( 2025-12-04T11:11:26.3819005Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml - 2025-12-04T11:11:26.3819234Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3820241Z FAILED [0.4701s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3820249Z 2025-12-04T11:11:26.3820458Z Expected 1 but got 2. 2025-12-04T11:11:26.3820599Z Absolute difference: 1 2025-12-04T11:11:26.3820749Z Relative difference: 1.0 2025-12-04T11:11:26.3820754Z 2025-12-04T11:11:26.3821044Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3821937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3821948Z 2025-12-04T11:11:26.3822351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3822563Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3822794Z =================== 1 failed, 3 deselected, 2 rerun in 4.78s =================== 2025-12-04T11:11:26.3822981Z Got exit code 1 2025-12-04T11:11:26.3823117Z Retrying single test... 2025-12-04T11:11:26.3823564Z W1204 11:00:08.194000 89830 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3824342Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml 2025-12-04T11:11:26.3824537Z ============================= test session starts ============================== 2025-12-04T11:11:26.3824968Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3825176Z cachedir: .pytest_cache 2025-12-04T11:11:26.3825724Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3825972Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3826143Z configfile: pytest.ini 2025-12-04T11:11:26.3826756Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3827008Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3828045Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3828220Z Running 1 items in this shard 2025-12-04T11:11:26.3828225Z 2025-12-04T11:11:26.3829546Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:11.376811483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3829554Z 2025-12-04T11:11:26.3830179Z [W1204 11:00:27.915699899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3830185Z 2025-12-04T11:11:26.3830728Z [W1204 11:00:27.915955422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3830735Z 2025-12-04T11:11:26.3831313Z [W1204 11:00:27.923070450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3831318Z 2025-12-04T11:11:26.3831848Z [W1204 11:00:27.923778594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3831857Z 2025-12-04T11:11:26.3832462Z [W1204 11:00:27.923963333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3832467Z 2025-12-04T11:11:26.3833020Z [W1204 11:00:27.930633003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3833025Z 2025-12-04T11:11:26.3833558Z [W1204 11:00:27.931289807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3833613Z 2025-12-04T11:11:26.3834143Z [W1204 11:00:27.931476942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3834148Z 2025-12-04T11:11:26.3834681Z [W1204 11:00:29.868191998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3834685Z 2025-12-04T11:11:26.3835247Z [W1204 11:00:29.869872704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3835251Z 2025-12-04T11:11:26.3835830Z [W1204 11:00:29.870099992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3835835Z 2025-12-04T11:11:26.3836430Z [W1204 11:00:29.873871391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3836438Z 2025-12-04T11:11:26.3836973Z [W1204 11:00:29.874493890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3836977Z 2025-12-04T11:11:26.3837553Z [W1204 11:00:29.874682459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3837558Z 2025-12-04T11:11:26.3838161Z [W1204 11:00:29.880505725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3838166Z 2025-12-04T11:11:26.3838757Z [W1204 11:00:29.881113208 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3838790Z 2025-12-04T11:11:26.3839340Z [W1204 11:00:29.881300893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3839371Z 2025-12-04T11:11:26.3839535Z ('RERUN', {'yellow': True}) [19.3569s] [100%] 2025-12-04T11:11:26.3840865Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:29.304864586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3840870Z 2025-12-04T11:11:26.3841407Z [W1204 11:00:29.305609619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3841411Z 2025-12-04T11:11:26.3842070Z [W1204 11:00:29.305810537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3842081Z 2025-12-04T11:11:26.3842656Z [W1204 11:00:29.309631300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3842663Z 2025-12-04T11:11:26.3843265Z [W1204 11:00:29.310449639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3843270Z 2025-12-04T11:11:26.3843801Z [W1204 11:00:29.310647500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3843806Z 2025-12-04T11:11:26.3844403Z [W1204 11:00:29.316496260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3844409Z 2025-12-04T11:11:26.3844944Z [W1204 11:00:29.317127118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3844951Z 2025-12-04T11:11:26.3845547Z [W1204 11:00:29.317313857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3845554Z 2025-12-04T11:11:26.3846103Z [W1204 11:00:29.400768315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3846108Z 2025-12-04T11:11:26.3846698Z [W1204 11:00:29.401536254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3846703Z 2025-12-04T11:11:26.3847238Z [W1204 11:00:29.401737880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3847242Z 2025-12-04T11:11:26.3847775Z [W1204 11:00:29.405561958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3847782Z 2025-12-04T11:11:26.3848346Z [W1204 11:00:29.406183261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3848351Z 2025-12-04T11:11:26.3848928Z [W1204 11:00:29.406373932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3848932Z 2025-12-04T11:11:26.3849530Z [W1204 11:00:29.412188351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3849535Z 2025-12-04T11:11:26.3850144Z [W1204 11:00:29.412963014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3850150Z 2025-12-04T11:11:26.3850729Z [W1204 11:00:29.413161970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3850760Z 2025-12-04T11:11:26.3850927Z ('RERUN', {'yellow': True}) [0.4942s] [100%] 2025-12-04T11:11:26.3852266Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:30.780534137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3852301Z 2025-12-04T11:11:26.3852870Z [W1204 11:00:30.781278196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3852875Z 2025-12-04T11:11:26.3853465Z [W1204 11:00:30.781472746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3853470Z 2025-12-04T11:11:26.3854000Z [W1204 11:00:30.785269993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3854006Z 2025-12-04T11:11:26.3854537Z [W1204 11:00:30.786038368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3854590Z 2025-12-04T11:11:26.3855106Z [W1204 11:00:30.786234660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3855111Z 2025-12-04T11:11:26.3855686Z [W1204 11:00:30.792049527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3855691Z 2025-12-04T11:11:26.3856293Z [W1204 11:00:30.792661229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3856297Z 2025-12-04T11:11:26.3856831Z [W1204 11:00:30.792842829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3856838Z 2025-12-04T11:11:26.3857484Z [W1204 11:00:30.877400876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3857490Z 2025-12-04T11:11:26.3858132Z [W1204 11:00:30.878168999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3858142Z 2025-12-04T11:11:26.3858795Z [W1204 11:00:30.878370694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3858803Z 2025-12-04T11:11:26.3859409Z [W1204 11:00:30.882226433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3859415Z 2025-12-04T11:11:26.3860086Z [W1204 11:00:30.882866163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3860096Z 2025-12-04T11:11:26.3860628Z [W1204 11:00:30.883057308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3860633Z 2025-12-04T11:11:26.3861162Z [W1204 11:00:30.888899778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3861216Z 2025-12-04T11:11:26.3861745Z [W1204 11:00:30.889700015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3861750Z 2025-12-04T11:11:26.3862327Z [W1204 11:00:30.889890767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3862407Z 2025-12-04T11:11:26.3862609Z FAILED [0.4741s] [100%] 2025-12-04T11:11:26.3862614Z 2025-12-04T11:11:26.3862797Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.3863421Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3863580Z Traceback (most recent call last): 2025-12-04T11:11:26.3864094Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3864501Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3864995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3865248Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3865811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3866052Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3866288Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3866294Z 2025-12-04T11:11:26.3866453Z Expected 1 but got 2. 2025-12-04T11:11:26.3866652Z Absolute difference: 1 2025-12-04T11:11:26.3866794Z Relative difference: 1.0 2025-12-04T11:11:26.3866800Z 2025-12-04T11:11:26.3867042Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3867989Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3867995Z 2025-12-04T11:11:26.3868325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3868647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3868810Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3869365Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3869666Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3869778Z graph_break [] 2025-12-04T11:11:26.3870062Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3871362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3871513Z if out == self.unknown_value: 2025-12-04T11:11:26.3872306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3872443Z warnings.warn( 2025-12-04T11:11:26.3873159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3873397Z warnings.warn( 2025-12-04T11:11:26.3873938Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3874140Z Traceback (most recent call last): 2025-12-04T11:11:26.3874668Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3874933Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3875472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3875700Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3876372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3876611Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3876773Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3876807Z 2025-12-04T11:11:26.3876973Z Expected 1 but got 2. 2025-12-04T11:11:26.3877156Z Absolute difference: 1 2025-12-04T11:11:26.3889821Z Relative difference: 1.0 2025-12-04T11:11:26.3889830Z 2025-12-04T11:11:26.3890065Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3891115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3891121Z 2025-12-04T11:11:26.3891389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3891622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3891753Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3892278Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3892520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3892623Z graph_break [] 2025-12-04T11:11:26.3892838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3894043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3894159Z if out == self.unknown_value: 2025-12-04T11:11:26.3894885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3894988Z warnings.warn( 2025-12-04T11:11:26.3895693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3895805Z warnings.warn( 2025-12-04T11:11:26.3896021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3896134Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3896371Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3896891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3897004Z graph_break [] 2025-12-04T11:11:26.3897216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3897930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3898045Z warnings.warn( 2025-12-04T11:11:26.3898743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3898859Z warnings.warn( 2025-12-04T11:11:26.3898999Z =================================== FAILURES =================================== 2025-12-04T11:11:26.3899495Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.3899629Z Traceback (most recent call last): 2025-12-04T11:11:26.3900126Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.3900352Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.3901183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.3901351Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.3901894Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.3902151Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.3902280Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3902285Z 2025-12-04T11:11:26.3902451Z Expected 1 but got 2. 2025-12-04T11:11:26.3902557Z Absolute difference: 1 2025-12-04T11:11:26.3902665Z Relative difference: 1.0 2025-12-04T11:11:26.3902684Z 2025-12-04T11:11:26.3902895Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3903780Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3903791Z 2025-12-04T11:11:26.3904065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3904282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3904396Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3904927Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3905150Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3905260Z graph_break [] 2025-12-04T11:11:26.3905471Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3906654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.3906784Z if out == self.unknown_value: 2025-12-04T11:11:26.3907499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3907611Z warnings.warn( 2025-12-04T11:11:26.3908314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3908413Z warnings.warn( 2025-12-04T11:11:26.3908640Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3908750Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3908971Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3909495Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3909598Z graph_break [] 2025-12-04T11:11:26.3909820Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3910525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3910624Z warnings.warn( 2025-12-04T11:11:26.3911341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3911442Z warnings.warn( 2025-12-04T11:11:26.3911667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.3911779Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.3912001Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.3912586Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.3912682Z graph_break [] 2025-12-04T11:11:26.3912891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.3913610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3913736Z warnings.warn( 2025-12-04T11:11:26.3914452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.3914578Z warnings.warn( 2025-12-04T11:11:26.3915395Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml - 2025-12-04T11:11:26.3915574Z =========================== short test summary info ============================ 2025-12-04T11:11:26.3916499Z FAILED [0.4741s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.3916507Z 2025-12-04T11:11:26.3916623Z Expected 1 but got 2. 2025-12-04T11:11:26.3916726Z Absolute difference: 1 2025-12-04T11:11:26.3916833Z Relative difference: 1.0 2025-12-04T11:11:26.3916838Z 2025-12-04T11:11:26.3917062Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.3917952Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3919005Z 2025-12-04T11:11:26.3919269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.3919852Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.3920353Z ================== 1 failed, 10 deselected, 2 rerun in 20.36s ================== 2025-12-04T11:11:26.3920785Z Got exit code 1 2025-12-04T11:11:26.3921049Z Retrying single test... 2025-12-04T11:11:26.3921743Z W1204 11:00:41.304000 90011 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.3922962Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml 2025-12-04T11:11:26.3923918Z ============================= test session starts ============================== 2025-12-04T11:11:26.3924573Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.3925153Z cachedir: .pytest_cache 2025-12-04T11:11:26.3925851Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.3926650Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.3927036Z configfile: pytest.ini 2025-12-04T11:11:26.3927766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.3928654Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.3929969Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.3931164Z Running 1 items in this shard 2025-12-04T11:11:26.3931388Z 2025-12-04T11:11:26.3932646Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:44.479874186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3934131Z 2025-12-04T11:11:26.3934644Z [W1204 11:01:00.154424059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3935290Z 2025-12-04T11:11:26.3935850Z [W1204 11:01:00.154673944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3936483Z 2025-12-04T11:11:26.3937000Z [W1204 11:01:00.161921944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3937665Z 2025-12-04T11:11:26.3938169Z [W1204 11:01:00.162664746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3938817Z 2025-12-04T11:11:26.3939325Z [W1204 11:01:00.162851385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3939972Z 2025-12-04T11:11:26.3940477Z [W1204 11:01:00.169598287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3941128Z 2025-12-04T11:11:26.3941634Z [W1204 11:01:00.170321879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3942264Z 2025-12-04T11:11:26.3942782Z [W1204 11:01:00.170516066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3943418Z 2025-12-04T11:11:26.3943923Z [W1204 11:01:02.114323524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3944575Z 2025-12-04T11:11:26.3945080Z [W1204 11:01:02.116021614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3945731Z 2025-12-04T11:11:26.3946236Z [W1204 11:01:02.116223485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3946887Z 2025-12-04T11:11:26.3947391Z [W1204 11:01:02.120059755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3948030Z 2025-12-04T11:11:26.3948545Z [W1204 11:01:02.120684373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3949188Z 2025-12-04T11:11:26.3949708Z [W1204 11:01:02.120871428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3950343Z 2025-12-04T11:11:26.3950850Z [W1204 11:01:02.126763349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3951501Z 2025-12-04T11:11:26.3952017Z [W1204 11:01:02.127366744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3952665Z 2025-12-04T11:11:26.3953168Z [W1204 11:01:02.127553255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3953803Z 2025-12-04T11:11:26.3953952Z ('RERUN', {'yellow': True}) [19.4920s] [100%] 2025-12-04T11:11:26.3955449Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:01:02.564391588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3956846Z 2025-12-04T11:11:26.3957350Z [W1204 11:01:02.565143641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3958001Z 2025-12-04T11:11:26.3958590Z [W1204 11:01:02.565335597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3959296Z 2025-12-04T11:11:26.3959807Z [W1204 11:01:02.569124949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3960480Z 2025-12-04T11:11:26.3961003Z [W1204 11:01:02.569899636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3961742Z 2025-12-04T11:11:26.3962265Z [W1204 11:01:02.570106817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3962898Z 2025-12-04T11:11:26.3963401Z [W1204 11:01:02.576024100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3964049Z 2025-12-04T11:11:26.3964554Z [W1204 11:01:02.576624696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3965208Z 2025-12-04T11:11:26.3965714Z [W1204 11:01:02.576804818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3966360Z 2025-12-04T11:11:26.3966880Z [W1204 11:01:03.661656996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3967515Z 2025-12-04T11:11:26.3968034Z [W1204 11:01:03.662423732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3968669Z 2025-12-04T11:11:26.3969176Z [W1204 11:01:03.662621634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3969822Z 2025-12-04T11:11:26.3970327Z [W1204 11:01:03.666419258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3970975Z 2025-12-04T11:11:26.3971476Z [W1204 11:01:03.667050083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3972115Z 2025-12-04T11:11:26.3972637Z [W1204 11:01:03.667238592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3973271Z 2025-12-04T11:11:26.3973787Z [W1204 11:01:03.673143161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3974430Z 2025-12-04T11:11:26.3974935Z [W1204 11:01:03.673977344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3975585Z 2025-12-04T11:11:26.3976091Z [W1204 11:01:03.674179117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3976738Z 2025-12-04T11:11:26.3976872Z ('RERUN', {'yellow': True}) [0.5083s] [100%] 2025-12-04T11:11:26.3978390Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:01:03.045772888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3979775Z 2025-12-04T11:11:26.3980292Z [W1204 11:01:03.046535662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3980934Z 2025-12-04T11:11:26.3981448Z [W1204 11:01:03.046729343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3982083Z 2025-12-04T11:11:26.3982670Z [W1204 11:01:03.050556942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3983324Z 2025-12-04T11:11:26.3983824Z [W1204 11:01:03.051335059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3984503Z 2025-12-04T11:11:26.3985009Z [W1204 11:01:03.051519408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3985648Z 2025-12-04T11:11:26.3986171Z [W1204 11:01:03.057381781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3986838Z 2025-12-04T11:11:26.3987365Z [W1204 11:01:03.057986486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3988006Z 2025-12-04T11:11:26.3988505Z [W1204 11:01:03.058180921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3989155Z 2025-12-04T11:11:26.3989657Z [W1204 11:01:03.144682376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3990309Z 2025-12-04T11:11:26.3990817Z [W1204 11:01:03.145459685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3991451Z 2025-12-04T11:11:26.3991968Z [W1204 11:01:03.145662803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3992609Z 2025-12-04T11:11:26.3993121Z [W1204 11:01:03.149534778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3993754Z 2025-12-04T11:11:26.3994255Z [W1204 11:01:03.150199261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3994901Z 2025-12-04T11:11:26.3995409Z [W1204 11:01:03.150393747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3996060Z 2025-12-04T11:11:26.3996559Z [W1204 11:01:03.156275070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3997192Z 2025-12-04T11:11:26.3997706Z [W1204 11:01:03.157071345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3998342Z 2025-12-04T11:11:26.3998860Z [W1204 11:01:03.157260667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.3999498Z 2025-12-04T11:11:26.3999596Z FAILED [0.4830s] [100%] 2025-12-04T11:11:26.3999779Z 2025-12-04T11:11:26.3999917Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4000708Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4001712Z Traceback (most recent call last): 2025-12-04T11:11:26.4002444Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4003318Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4004146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4004902Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4005712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4006579Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4007046Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4007298Z 2025-12-04T11:11:26.4007565Z Expected 1 but got 2. 2025-12-04T11:11:26.4007852Z Absolute difference: 1 2025-12-04T11:11:26.4008144Z Relative difference: 1.0 2025-12-04T11:11:26.4008330Z 2025-12-04T11:11:26.4008534Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4009815Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4010850Z 2025-12-04T11:11:26.4011156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4011779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4012236Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4012971Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4013859Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4014321Z graph_break [] 2025-12-04T11:11:26.4014677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4016224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4017656Z if out == self.unknown_value: 2025-12-04T11:11:26.4018597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4019537Z warnings.warn( 2025-12-04T11:11:26.4020407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4021360Z warnings.warn( 2025-12-04T11:11:26.4022016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4022758Z Traceback (most recent call last): 2025-12-04T11:11:26.4023491Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4024358Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4025157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4025907Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4026718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4027582Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4028033Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4028295Z 2025-12-04T11:11:26.4028397Z Expected 1 but got 2. 2025-12-04T11:11:26.4028679Z Absolute difference: 1 2025-12-04T11:11:26.4028960Z Relative difference: 1.0 2025-12-04T11:11:26.4029157Z 2025-12-04T11:11:26.4029367Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4030591Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4031607Z 2025-12-04T11:11:26.4031885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4032491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4032958Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4033692Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4034645Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4035112Z graph_break [] 2025-12-04T11:11:26.4035486Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4037058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4038520Z if out == self.unknown_value: 2025-12-04T11:11:26.4039464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4040419Z warnings.warn( 2025-12-04T11:11:26.4041293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4042298Z warnings.warn( 2025-12-04T11:11:26.4042678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4043156Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4043584Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4044471Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4045227Z graph_break [] 2025-12-04T11:11:26.4045600Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4046662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4047608Z warnings.warn( 2025-12-04T11:11:26.4048481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4049412Z warnings.warn( 2025-12-04T11:11:26.4049723Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4050512Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4051282Z Traceback (most recent call last): 2025-12-04T11:11:26.4052006Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4052874Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4053685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4054433Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4055242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4056103Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4056567Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4056815Z 2025-12-04T11:11:26.4056918Z Expected 1 but got 2. 2025-12-04T11:11:26.4057205Z Absolute difference: 1 2025-12-04T11:11:26.4057498Z Relative difference: 1.0 2025-12-04T11:11:26.4057682Z 2025-12-04T11:11:26.4057905Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4059136Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4060164Z 2025-12-04T11:11:26.4060427Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4061043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4061615Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4062339Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4063253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4063716Z graph_break [] 2025-12-04T11:11:26.4064071Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4065607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4067065Z if out == self.unknown_value: 2025-12-04T11:11:26.4067989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4068931Z warnings.warn( 2025-12-04T11:11:26.4069801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4070751Z warnings.warn( 2025-12-04T11:11:26.4071126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4071584Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4072031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4072915Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4073654Z graph_break [] 2025-12-04T11:11:26.4074024Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4075103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4076047Z warnings.warn( 2025-12-04T11:11:26.4076895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4077832Z warnings.warn( 2025-12-04T11:11:26.4078202Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4078654Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4079091Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4079963Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4080712Z graph_break [] 2025-12-04T11:11:26.4081064Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4082211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4083163Z warnings.warn( 2025-12-04T11:11:26.4084029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4084959Z warnings.warn( 2025-12-04T11:11:26.4085941Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml - 2025-12-04T11:11:26.4087068Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4088300Z FAILED [0.4830s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4089336Z 2025-12-04T11:11:26.4089535Z Expected 1 but got 2. 2025-12-04T11:11:26.4089820Z Absolute difference: 1 2025-12-04T11:11:26.4090111Z Relative difference: 1.0 2025-12-04T11:11:26.4090296Z 2025-12-04T11:11:26.4090504Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4091785Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4092935Z 2025-12-04T11:11:26.4093199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4093776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4094274Z ================== 1 failed, 10 deselected, 2 rerun in 20.52s ================== 2025-12-04T11:11:26.4094709Z Got exit code 1 2025-12-04T11:11:26.4095682Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4097024Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.4097998Z W1204 11:01:14.888000 90193 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4099218Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml 2025-12-04T11:11:26.4100177Z ============================= test session starts ============================== 2025-12-04T11:11:26.4101000Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4101579Z cachedir: .pytest_cache 2025-12-04T11:11:26.4102272Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4103048Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4103381Z configfile: pytest.ini 2025-12-04T11:11:26.4104093Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4104979Z collecting ... collected 58 items / 4 deselected / 54 selected 2025-12-04T11:11:26.4105460Z stepcurrent: skipping 4 already run items. 2025-12-04T11:11:26.4105824Z Running 7 items in this shard 2025-12-04T11:11:26.4106046Z 2025-12-04T11:11:26.4106888Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7669s] [ 14%] 2025-12-04T11:11:26.4108691Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4444s] [ 14%] 2025-12-04T11:11:26.4110416Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4406s] [ 14%] 2025-12-04T11:11:26.4111304Z 2025-12-04T11:11:26.4111460Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4112222Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4112970Z Traceback (most recent call last): 2025-12-04T11:11:26.4113709Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4114565Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4115380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4116134Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4117102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4117963Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4118489Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4118735Z 2025-12-04T11:11:26.4118855Z Expected 1 but got 2. 2025-12-04T11:11:26.4119132Z Absolute difference: 1 2025-12-04T11:11:26.4119428Z Relative difference: 1.0 2025-12-04T11:11:26.4119675Z 2025-12-04T11:11:26.4119884Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4121115Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4122204Z 2025-12-04T11:11:26.4122469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4123096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4123572Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4124662Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4125891Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4126351Z graph_break [] 2025-12-04T11:11:26.4126714Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4127786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4128722Z warnings.warn( 2025-12-04T11:11:26.4129602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4130543Z warnings.warn( 2025-12-04T11:11:26.4131181Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4131928Z Traceback (most recent call last): 2025-12-04T11:11:26.4132662Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4133530Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4134333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4135079Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4135903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4136781Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4137236Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4137495Z 2025-12-04T11:11:26.4137600Z Expected 1 but got 2. 2025-12-04T11:11:26.4137884Z Absolute difference: 1 2025-12-04T11:11:26.4138166Z Relative difference: 1.0 2025-12-04T11:11:26.4138362Z 2025-12-04T11:11:26.4138569Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4139792Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4140797Z 2025-12-04T11:11:26.4141072Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4141674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4142137Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4143312Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4144542Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4145043Z graph_break [] 2025-12-04T11:11:26.4145414Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4146493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4147470Z warnings.warn( 2025-12-04T11:11:26.4148349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4149297Z warnings.warn( 2025-12-04T11:11:26.4149673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4150138Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4150572Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4151799Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4152905Z graph_break [] 2025-12-04T11:11:26.4153255Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4154323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4155265Z warnings.warn( 2025-12-04T11:11:26.4156115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4157063Z warnings.warn( 2025-12-04T11:11:26.4157368Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4158144Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4158879Z Traceback (most recent call last): 2025-12-04T11:11:26.4159610Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4160471Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4161269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4162103Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4162927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4163801Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4164259Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4164519Z 2025-12-04T11:11:26.4164621Z Expected 1 but got 2. 2025-12-04T11:11:26.4164903Z Absolute difference: 1 2025-12-04T11:11:26.4165174Z Relative difference: 1.0 2025-12-04T11:11:26.4165370Z 2025-12-04T11:11:26.4165578Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4166803Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4167811Z 2025-12-04T11:11:26.4168083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4168688Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4169155Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4170367Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4171624Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4172076Z graph_break [] 2025-12-04T11:11:26.4172447Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4173520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4174507Z warnings.warn( 2025-12-04T11:11:26.4175363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4176307Z warnings.warn( 2025-12-04T11:11:26.4176683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4177143Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4177585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4178814Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4179915Z graph_break [] 2025-12-04T11:11:26.4180279Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4181352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4182301Z warnings.warn( 2025-12-04T11:11:26.4183178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4184111Z warnings.warn( 2025-12-04T11:11:26.4184484Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4184952Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4185380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4186622Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4187734Z graph_break [] 2025-12-04T11:11:26.4188107Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4189170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4190118Z warnings.warn( 2025-12-04T11:11:26.4190996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4191938Z warnings.warn( 2025-12-04T11:11:26.4192912Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml - 2025-12-04T11:11:26.4194034Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4195264Z FAILED [0.4406s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4196288Z 2025-12-04T11:11:26.4196402Z Expected 1 but got 2. 2025-12-04T11:11:26.4196675Z Absolute difference: 1 2025-12-04T11:11:26.4196968Z Relative difference: 1.0 2025-12-04T11:11:26.4197154Z 2025-12-04T11:11:26.4197461Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4198672Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4199724Z 2025-12-04T11:11:26.4199983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4200564Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4201309Z =================== 1 failed, 4 deselected, 2 rerun in 4.68s =================== 2025-12-04T11:11:26.4201789Z Got exit code 1 2025-12-04T11:11:26.4202052Z Retrying single test... 2025-12-04T11:11:26.4202671Z W1204 11:01:34.951000 90362 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4203892Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml 2025-12-04T11:11:26.4204829Z ============================= test session starts ============================== 2025-12-04T11:11:26.4205480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4206074Z cachedir: .pytest_cache 2025-12-04T11:11:26.4206764Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4207517Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4207863Z configfile: pytest.ini 2025-12-04T11:11:26.4208581Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4209448Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4210762Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4211959Z Running 1 items in this shard 2025-12-04T11:11:26.4212164Z 2025-12-04T11:11:26.4213412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:40.977822627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4214786Z 2025-12-04T11:11:26.4215310Z [W1204 11:01:55.001878862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4215953Z 2025-12-04T11:11:26.4216455Z [W1204 11:01:55.002137378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4217099Z 2025-12-04T11:11:26.4217605Z [W1204 11:01:55.009507138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4218250Z 2025-12-04T11:11:26.4218756Z [W1204 11:01:55.010339312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4219390Z 2025-12-04T11:11:26.4219904Z [W1204 11:01:55.010542993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4220537Z 2025-12-04T11:11:26.4221055Z [W1204 11:01:55.017517850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4221685Z 2025-12-04T11:11:26.4222189Z [W1204 11:01:55.018375575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4222835Z 2025-12-04T11:11:26.4223495Z [W1204 11:01:55.018561389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4224152Z 2025-12-04T11:11:26.4224657Z [W1204 11:01:55.155353990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4225340Z 2025-12-04T11:11:26.4225863Z [W1204 11:01:55.157186180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4226501Z 2025-12-04T11:11:26.4227061Z [W1204 11:01:55.157397471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4227697Z 2025-12-04T11:11:26.4228198Z [W1204 11:01:55.161541591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4228846Z 2025-12-04T11:11:26.4229351Z [W1204 11:01:55.162255472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4229997Z 2025-12-04T11:11:26.4230497Z [W1204 11:01:55.162449046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4231135Z 2025-12-04T11:11:26.4231648Z [W1204 11:01:55.168593685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4232280Z 2025-12-04T11:11:26.4232795Z [W1204 11:01:55.169355035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4233429Z 2025-12-04T11:11:26.4233930Z [W1204 11:01:55.169553439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4234580Z 2025-12-04T11:11:26.4234711Z ('RERUN', {'yellow': True}) [18.8635s] [100%] 2025-12-04T11:11:26.4236210Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:55.571798181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4237582Z 2025-12-04T11:11:26.4238086Z [W1204 11:01:55.572551053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4238720Z 2025-12-04T11:11:26.4239234Z [W1204 11:01:55.572751573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4239873Z 2025-12-04T11:11:26.4240383Z [W1204 11:01:55.576698070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4241016Z 2025-12-04T11:11:26.4241611Z [W1204 11:01:55.577314176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4242265Z 2025-12-04T11:11:26.4242765Z [W1204 11:01:55.577500249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4243415Z 2025-12-04T11:11:26.4243915Z [W1204 11:01:55.583615855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4244548Z 2025-12-04T11:11:26.4245062Z [W1204 11:01:55.584260679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4245696Z 2025-12-04T11:11:26.4246215Z [W1204 11:01:55.584443541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4246848Z 2025-12-04T11:11:26.4247346Z [W1204 11:01:56.671311198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4248087Z 2025-12-04T11:11:26.4248586Z [W1204 11:01:56.672069008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4249229Z 2025-12-04T11:11:26.4249765Z [W1204 11:01:56.672266218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4250398Z 2025-12-04T11:11:26.4250910Z [W1204 11:01:56.676115260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4251577Z 2025-12-04T11:11:26.4252092Z [W1204 11:01:56.676742676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4252725Z 2025-12-04T11:11:26.4253227Z [W1204 11:01:56.676931833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4253874Z 2025-12-04T11:11:26.4254376Z [W1204 11:01:56.682872253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4255022Z 2025-12-04T11:11:26.4255521Z [W1204 11:01:56.683703487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4256158Z 2025-12-04T11:11:26.4256670Z [W1204 11:01:56.683892549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4257304Z 2025-12-04T11:11:26.4257446Z ('RERUN', {'yellow': True}) [0.4738s] [100%] 2025-12-04T11:11:26.4258920Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:56.022280564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4260298Z 2025-12-04T11:11:26.4260808Z [W1204 11:01:56.023015910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4261460Z 2025-12-04T11:11:26.4261963Z [W1204 11:01:56.023209421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4262598Z 2025-12-04T11:11:26.4263115Z [W1204 11:01:56.027144697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4263747Z 2025-12-04T11:11:26.4264259Z [W1204 11:01:56.027760459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4264888Z 2025-12-04T11:11:26.4265390Z [W1204 11:01:56.027945369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4266035Z 2025-12-04T11:11:26.4266540Z [W1204 11:01:56.033998053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4267185Z 2025-12-04T11:11:26.4267687Z [W1204 11:01:56.034643585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4268336Z 2025-12-04T11:11:26.4268833Z [W1204 11:01:56.034824921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4269469Z 2025-12-04T11:11:26.4269982Z [W1204 11:01:56.123174276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4270610Z 2025-12-04T11:11:26.4271123Z [W1204 11:01:56.123961593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4271756Z 2025-12-04T11:11:26.4272317Z [W1204 11:01:56.124165426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4272961Z 2025-12-04T11:11:26.4273464Z [W1204 11:01:56.128127390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4274138Z 2025-12-04T11:11:26.4274640Z [W1204 11:01:56.128802321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4275306Z 2025-12-04T11:11:26.4275821Z [W1204 11:01:56.128994523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4276457Z 2025-12-04T11:11:26.4276973Z [W1204 11:01:56.134997557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4277602Z 2025-12-04T11:11:26.4278111Z [W1204 11:01:56.135883643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4278754Z 2025-12-04T11:11:26.4279257Z [W1204 11:01:56.136073684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4279908Z 2025-12-04T11:11:26.4280008Z FAILED [0.4510s] [100%] 2025-12-04T11:11:26.4280181Z 2025-12-04T11:11:26.4280337Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4281104Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4281917Z Traceback (most recent call last): 2025-12-04T11:11:26.4282657Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4283526Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4284336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4285095Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4285926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4286787Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4287260Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4287525Z 2025-12-04T11:11:26.4287633Z Expected 1 but got 2. 2025-12-04T11:11:26.4287916Z Absolute difference: 1 2025-12-04T11:11:26.4288196Z Relative difference: 1.0 2025-12-04T11:11:26.4288397Z 2025-12-04T11:11:26.4288604Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4289832Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4290845Z 2025-12-04T11:11:26.4291128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4291740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4292215Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4293307Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4294540Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4294986Z graph_break [] 2025-12-04T11:11:26.4295355Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4296966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4298375Z if out == self.unknown_value: 2025-12-04T11:11:26.4299303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4300289Z warnings.warn( 2025-12-04T11:11:26.4301345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4302348Z warnings.warn( 2025-12-04T11:11:26.4303002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4303753Z Traceback (most recent call last): 2025-12-04T11:11:26.4304483Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4305337Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4306149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4306900Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4307709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4308576Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4309042Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4309288Z 2025-12-04T11:11:26.4309404Z Expected 1 but got 2. 2025-12-04T11:11:26.4309674Z Absolute difference: 1 2025-12-04T11:11:26.4309964Z Relative difference: 1.0 2025-12-04T11:11:26.4310149Z 2025-12-04T11:11:26.4310370Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4311585Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4312603Z 2025-12-04T11:11:26.4312865Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4313486Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4313957Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4315026Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4316257Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4316717Z graph_break [] 2025-12-04T11:11:26.4317087Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4318615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4320040Z if out == self.unknown_value: 2025-12-04T11:11:26.4320977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4321992Z warnings.warn( 2025-12-04T11:11:26.4322850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4323796Z warnings.warn( 2025-12-04T11:11:26.4324170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4324638Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4325061Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4326401Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4327558Z graph_break [] 2025-12-04T11:11:26.4327908Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4328975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4329957Z warnings.warn( 2025-12-04T11:11:26.4330825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4331757Z warnings.warn( 2025-12-04T11:11:26.4332062Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4332847Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4333590Z Traceback (most recent call last): 2025-12-04T11:11:26.4334314Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4335187Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4335991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4336725Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4337544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4338408Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4338871Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4339118Z 2025-12-04T11:11:26.4339229Z Expected 1 but got 2. 2025-12-04T11:11:26.4339514Z Absolute difference: 1 2025-12-04T11:11:26.4339805Z Relative difference: 1.0 2025-12-04T11:11:26.4339987Z 2025-12-04T11:11:26.4340197Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4341422Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4342436Z 2025-12-04T11:11:26.4342696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4343313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4343766Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4344854Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4346077Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4346536Z graph_break [] 2025-12-04T11:11:26.4346886Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4348420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4349854Z if out == self.unknown_value: 2025-12-04T11:11:26.4350793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4351733Z warnings.warn( 2025-12-04T11:11:26.4352705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4353656Z warnings.warn( 2025-12-04T11:11:26.4354035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4354526Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4354967Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4355836Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4355982Z graph_break [] 2025-12-04T11:11:26.4356198Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4356927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4357028Z warnings.warn( 2025-12-04T11:11:26.4357735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4357848Z warnings.warn( 2025-12-04T11:11:26.4358063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4358177Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4358413Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4359285Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4359394Z graph_break [] 2025-12-04T11:11:26.4359606Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4360321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4360432Z warnings.warn( 2025-12-04T11:11:26.4361132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4361245Z warnings.warn( 2025-12-04T11:11:26.4362144Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml - 2025-12-04T11:11:26.4362319Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4363245Z FAILED [0.4510s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4363251Z 2025-12-04T11:11:26.4363359Z Expected 1 but got 2. 2025-12-04T11:11:26.4363480Z Absolute difference: 1 2025-12-04T11:11:26.4363585Z Relative difference: 1.0 2025-12-04T11:11:26.4363590Z 2025-12-04T11:11:26.4363802Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4364701Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4364706Z 2025-12-04T11:11:26.4364968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4365155Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4365348Z ================== 1 failed, 10 deselected, 2 rerun in 19.82s ================== 2025-12-04T11:11:26.4365443Z Got exit code 1 2025-12-04T11:11:26.4365561Z Retrying single test... 2025-12-04T11:11:26.4366066Z W1204 11:02:07.933000 90536 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4366717Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml 2025-12-04T11:11:26.4366927Z ============================= test session starts ============================== 2025-12-04T11:11:26.4367272Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4367389Z cachedir: .pytest_cache 2025-12-04T11:11:26.4367930Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4368052Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4368168Z configfile: pytest.ini 2025-12-04T11:11:26.4368700Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4368918Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4369886Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4370000Z Running 1 items in this shard 2025-12-04T11:11:26.4370005Z 2025-12-04T11:11:26.4371251Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:13.903991753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4371259Z 2025-12-04T11:11:26.4371767Z [W1204 11:02:28.615233474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4371772Z 2025-12-04T11:11:26.4372292Z [W1204 11:02:28.615492186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4372297Z 2025-12-04T11:11:26.4372794Z [W1204 11:02:29.622862288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4372802Z 2025-12-04T11:11:26.4373314Z [W1204 11:02:29.623607436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4373319Z 2025-12-04T11:11:26.4373817Z [W1204 11:02:29.623802286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4373821Z 2025-12-04T11:11:26.4374320Z [W1204 11:02:29.630757437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4374337Z 2025-12-04T11:11:26.4374839Z [W1204 11:02:29.631581072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4374844Z 2025-12-04T11:11:26.4375342Z [W1204 11:02:29.631769906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4375349Z 2025-12-04T11:11:26.4375860Z [W1204 11:02:29.766370804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4375864Z 2025-12-04T11:11:26.4376361Z [W1204 11:02:29.768093826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4376368Z 2025-12-04T11:11:26.4376878Z [W1204 11:02:29.768294765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4376883Z 2025-12-04T11:11:26.4377434Z [W1204 11:02:29.772229945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4377439Z 2025-12-04T11:11:26.4377952Z [W1204 11:02:29.772886409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4377989Z 2025-12-04T11:11:26.4378485Z [W1204 11:02:29.773078770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4378491Z 2025-12-04T11:11:26.4379004Z [W1204 11:02:29.779024015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4379061Z 2025-12-04T11:11:26.4379560Z [W1204 11:02:29.779657627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4379564Z 2025-12-04T11:11:26.4380061Z [W1204 11:02:29.779849749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4380083Z 2025-12-04T11:11:26.4380212Z ('RERUN', {'yellow': True}) [19.4936s] [100%] 2025-12-04T11:11:26.4381451Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:29.179029099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4381459Z 2025-12-04T11:11:26.4381971Z [W1204 11:02:29.179773396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4381978Z 2025-12-04T11:11:26.4382472Z [W1204 11:02:29.179969761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4382477Z 2025-12-04T11:11:26.4382988Z [W1204 11:02:29.183911115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4382996Z 2025-12-04T11:11:26.4383493Z [W1204 11:02:29.184526715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4383498Z 2025-12-04T11:11:26.4384009Z [W1204 11:02:29.184712306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4384014Z 2025-12-04T11:11:26.4384508Z [W1204 11:02:29.190781903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4384515Z 2025-12-04T11:11:26.4385010Z [W1204 11:02:29.191394887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4385027Z 2025-12-04T11:11:26.4385523Z [W1204 11:02:29.191577756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4385527Z 2025-12-04T11:11:26.4386029Z [W1204 11:02:29.278343259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4386034Z 2025-12-04T11:11:26.4386544Z [W1204 11:02:29.279120334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4386551Z 2025-12-04T11:11:26.4387050Z [W1204 11:02:29.279329305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4387057Z 2025-12-04T11:11:26.4387564Z [W1204 11:02:29.283250241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4387568Z 2025-12-04T11:11:26.4388074Z [W1204 11:02:29.283881956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4388079Z 2025-12-04T11:11:26.4389365Z [W1204 11:02:29.284076030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4389370Z 2025-12-04T11:11:26.4389879Z [W1204 11:02:29.289969916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4389914Z 2025-12-04T11:11:26.4390434Z [W1204 11:02:29.290782534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4390488Z 2025-12-04T11:11:26.4390992Z [W1204 11:02:29.290978451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4390997Z 2025-12-04T11:11:26.4391124Z ('RERUN', {'yellow': True}) [0.4714s] [100%] 2025-12-04T11:11:26.4392386Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:30.625002528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4392392Z 2025-12-04T11:11:26.4392894Z [W1204 11:02:30.625710320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4392900Z 2025-12-04T11:11:26.4393415Z [W1204 11:02:30.625903932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4393422Z 2025-12-04T11:11:26.4393922Z [W1204 11:02:30.629791944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4393927Z 2025-12-04T11:11:26.4394438Z [W1204 11:02:30.630461221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4394443Z 2025-12-04T11:11:26.4394950Z [W1204 11:02:30.630651561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4394954Z 2025-12-04T11:11:26.4395470Z [W1204 11:02:30.636554565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4395477Z 2025-12-04T11:11:26.4395976Z [W1204 11:02:30.637183459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4395983Z 2025-12-04T11:11:26.4396495Z [W1204 11:02:30.637368777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4396500Z 2025-12-04T11:11:26.4396998Z [W1204 11:02:30.723148759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4397003Z 2025-12-04T11:11:26.4397504Z [W1204 11:02:30.723903833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4397520Z 2025-12-04T11:11:26.4398014Z [W1204 11:02:30.724105880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4398022Z 2025-12-04T11:11:26.4398521Z [W1204 11:02:30.728055583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4398525Z 2025-12-04T11:11:26.4399039Z [W1204 11:02:30.728709729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4399044Z 2025-12-04T11:11:26.4399541Z [W1204 11:02:30.728909582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4399545Z 2025-12-04T11:11:26.4400108Z [W1204 11:02:30.734879426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4400113Z 2025-12-04T11:11:26.4400608Z [W1204 11:02:30.735685099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4400642Z 2025-12-04T11:11:26.4401392Z [W1204 11:02:30.735874321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4401397Z 2025-12-04T11:11:26.4401556Z FAILED [0.4438s] [100%] 2025-12-04T11:11:26.4401632Z 2025-12-04T11:11:26.4401780Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4402283Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4402402Z Traceback (most recent call last): 2025-12-04T11:11:26.4402920Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4403150Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4403602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4403780Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4404305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4404524Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4404653Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4404658Z 2025-12-04T11:11:26.4404761Z Expected 1 but got 2. 2025-12-04T11:11:26.4404884Z Absolute difference: 1 2025-12-04T11:11:26.4404991Z Relative difference: 1.0 2025-12-04T11:11:26.4404995Z 2025-12-04T11:11:26.4405203Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4406096Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4406102Z 2025-12-04T11:11:26.4406366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4406593Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4406706Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4407577Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4407816Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4407911Z graph_break [] 2025-12-04T11:11:26.4408134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4409318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4409433Z if out == self.unknown_value: 2025-12-04T11:11:26.4410161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4410261Z warnings.warn( 2025-12-04T11:11:26.4410979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4411076Z warnings.warn( 2025-12-04T11:11:26.4411567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4411698Z Traceback (most recent call last): 2025-12-04T11:11:26.4412281Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4412511Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4413015Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4413175Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4413714Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4413951Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4414080Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4414085Z 2025-12-04T11:11:26.4414202Z Expected 1 but got 2. 2025-12-04T11:11:26.4414308Z Absolute difference: 1 2025-12-04T11:11:26.4414414Z Relative difference: 1.0 2025-12-04T11:11:26.4414434Z 2025-12-04T11:11:26.4414647Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4415525Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4415533Z 2025-12-04T11:11:26.4415810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4416023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4416139Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4417020Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4417241Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4417351Z graph_break [] 2025-12-04T11:11:26.4417567Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4418745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4418874Z if out == self.unknown_value: 2025-12-04T11:11:26.4419581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4419694Z warnings.warn( 2025-12-04T11:11:26.4420398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4420495Z warnings.warn( 2025-12-04T11:11:26.4420723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4420833Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4421068Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4421940Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4422039Z graph_break [] 2025-12-04T11:11:26.4422261Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4422973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4423070Z warnings.warn( 2025-12-04T11:11:26.4423785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4423945Z warnings.warn( 2025-12-04T11:11:26.4424102Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4424593Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4424876Z Traceback (most recent call last): 2025-12-04T11:11:26.4425390Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4425649Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4426116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4426279Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4426804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4427028Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4427157Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4427162Z 2025-12-04T11:11:26.4427266Z Expected 1 but got 2. 2025-12-04T11:11:26.4427388Z Absolute difference: 1 2025-12-04T11:11:26.4427497Z Relative difference: 1.0 2025-12-04T11:11:26.4427502Z 2025-12-04T11:11:26.4427728Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4428605Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4428613Z 2025-12-04T11:11:26.4428875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4429106Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4429221Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4430117Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4430341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4430438Z graph_break [] 2025-12-04T11:11:26.4430667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4431847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4431977Z if out == self.unknown_value: 2025-12-04T11:11:26.4432689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4432792Z warnings.warn( 2025-12-04T11:11:26.4433515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4433614Z warnings.warn( 2025-12-04T11:11:26.4433827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4433955Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4434182Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4435063Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4435159Z graph_break [] 2025-12-04T11:11:26.4435371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4436148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4436246Z warnings.warn( 2025-12-04T11:11:26.4436969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4437113Z warnings.warn( 2025-12-04T11:11:26.4437323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4437476Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4437698Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4438570Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4438679Z graph_break [] 2025-12-04T11:11:26.4438895Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4439616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4439717Z warnings.warn( 2025-12-04T11:11:26.4440418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4440530Z warnings.warn( 2025-12-04T11:11:26.4441356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml - 2025-12-04T11:11:26.4441607Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4442521Z FAILED [0.4438s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4442527Z 2025-12-04T11:11:26.4442632Z Expected 1 but got 2. 2025-12-04T11:11:26.4442751Z Absolute difference: 1 2025-12-04T11:11:26.4442859Z Relative difference: 1.0 2025-12-04T11:11:26.4442864Z 2025-12-04T11:11:26.4443088Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4443966Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4443973Z 2025-12-04T11:11:26.4444236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4444425Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4444616Z ================== 1 failed, 10 deselected, 2 rerun in 20.44s ================== 2025-12-04T11:11:26.4444730Z Got exit code 1 2025-12-04T11:11:26.4445524Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4445931Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.4446378Z W1204 11:02:41.168000 90710 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4447022Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml 2025-12-04T11:11:26.4447204Z ============================= test session starts ============================== 2025-12-04T11:11:26.4447547Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4447656Z cachedir: .pytest_cache 2025-12-04T11:11:26.4448253Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4448378Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4448484Z configfile: pytest.ini 2025-12-04T11:11:26.4449061Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4449271Z collecting ... collected 58 items / 5 deselected / 53 selected 2025-12-04T11:11:26.4449456Z stepcurrent: skipping 5 already run items. 2025-12-04T11:11:26.4449568Z Running 6 items in this shard 2025-12-04T11:11:26.4449573Z 2025-12-04T11:11:26.4450417Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7628s] [ 16%] 2025-12-04T11:11:26.4451269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4392s] [ 16%] 2025-12-04T11:11:26.4452018Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4369s] [ 16%] 2025-12-04T11:11:26.4452025Z 2025-12-04T11:11:26.4452182Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4452674Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4452795Z Traceback (most recent call last): 2025-12-04T11:11:26.4453313Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4453537Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4454008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4454167Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4454692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4454909Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4455036Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4455041Z 2025-12-04T11:11:26.4455157Z Expected 1 but got 2. 2025-12-04T11:11:26.4455264Z Absolute difference: 1 2025-12-04T11:11:26.4455371Z Relative difference: 1.0 2025-12-04T11:11:26.4455376Z 2025-12-04T11:11:26.4455597Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4456478Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4456483Z 2025-12-04T11:11:26.4456759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4456975Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4457092Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4457971Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4458197Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4458292Z graph_break [] 2025-12-04T11:11:26.4458520Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4459241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4459353Z warnings.warn( 2025-12-04T11:11:26.4460115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4460216Z warnings.warn( 2025-12-04T11:11:26.4460747Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4460865Z Traceback (most recent call last): 2025-12-04T11:11:26.4461362Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4461630Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4462076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4462248Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4462772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4462975Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4463120Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4463128Z 2025-12-04T11:11:26.4463230Z Expected 1 but got 2. 2025-12-04T11:11:26.4463346Z Absolute difference: 1 2025-12-04T11:11:26.4463452Z Relative difference: 1.0 2025-12-04T11:11:26.4463457Z 2025-12-04T11:11:26.4463666Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4464559Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4464563Z 2025-12-04T11:11:26.4464827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4465052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4465170Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4466036Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4466269Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4466368Z graph_break [] 2025-12-04T11:11:26.4466579Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4467310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4467408Z warnings.warn( 2025-12-04T11:11:26.4468130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4468230Z warnings.warn( 2025-12-04T11:11:26.4468440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4468564Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4468785Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4469669Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4469765Z graph_break [] 2025-12-04T11:11:26.4469974Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4470698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4470793Z warnings.warn( 2025-12-04T11:11:26.4471559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4471672Z warnings.warn( 2025-12-04T11:11:26.4471813Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4472347Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4472463Z Traceback (most recent call last): 2025-12-04T11:11:26.4472959Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4473233Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4473680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4473854Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4474383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4474582Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4474722Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4474729Z 2025-12-04T11:11:26.4474834Z Expected 1 but got 2. 2025-12-04T11:11:26.4474938Z Absolute difference: 1 2025-12-04T11:11:26.4475056Z Relative difference: 1.0 2025-12-04T11:11:26.4475060Z 2025-12-04T11:11:26.4475272Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4476166Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4476171Z 2025-12-04T11:11:26.4476434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4476649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4476775Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4477642Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4477877Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4477973Z graph_break [] 2025-12-04T11:11:26.4478184Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4478918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4479016Z warnings.warn( 2025-12-04T11:11:26.4479738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4479839Z warnings.warn( 2025-12-04T11:11:26.4480052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4480176Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4480400Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4481264Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4481374Z graph_break [] 2025-12-04T11:11:26.4481653Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4482379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4482475Z warnings.warn( 2025-12-04T11:11:26.4483265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4483376Z warnings.warn( 2025-12-04T11:11:26.4483585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4483728Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4483961Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4484831Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4484970Z graph_break [] 2025-12-04T11:11:26.4485177Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4485894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4486004Z warnings.warn( 2025-12-04T11:11:26.4486709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4486818Z warnings.warn( 2025-12-04T11:11:26.4487642Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml - 2025-12-04T11:11:26.4487814Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4488747Z FAILED [0.4369s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4488752Z 2025-12-04T11:11:26.4488856Z Expected 1 but got 2. 2025-12-04T11:11:26.4488982Z Absolute difference: 1 2025-12-04T11:11:26.4489090Z Relative difference: 1.0 2025-12-04T11:11:26.4489094Z 2025-12-04T11:11:26.4489305Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4490198Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4490206Z 2025-12-04T11:11:26.4490469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4490664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4490856Z =================== 1 failed, 5 deselected, 2 rerun in 4.67s =================== 2025-12-04T11:11:26.4490954Z Got exit code 1 2025-12-04T11:11:26.4491073Z Retrying single test... 2025-12-04T11:11:26.4491516Z W1204 11:03:00.783000 90879 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4492161Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml 2025-12-04T11:11:26.4492339Z ============================= test session starts ============================== 2025-12-04T11:11:26.4492686Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4492810Z cachedir: .pytest_cache 2025-12-04T11:11:26.4493321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4493447Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4493570Z configfile: pytest.ini 2025-12-04T11:11:26.4494103Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4494336Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4495364Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4495507Z Running 1 items in this shard 2025-12-04T11:11:26.4495512Z 2025-12-04T11:11:26.4496776Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:06.748836219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4496815Z 2025-12-04T11:11:26.4497326Z [W1204 11:03:21.404956419 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4497331Z 2025-12-04T11:11:26.4497852Z [W1204 11:03:21.405208430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4497858Z 2025-12-04T11:11:26.4498359Z [W1204 11:03:21.412347114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4498366Z 2025-12-04T11:11:26.4498878Z [W1204 11:03:21.413016398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4498883Z 2025-12-04T11:11:26.4499383Z [W1204 11:03:21.413202601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4499390Z 2025-12-04T11:11:26.4499901Z [W1204 11:03:21.419902114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4499906Z 2025-12-04T11:11:26.4500407Z [W1204 11:03:21.420666810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4500415Z 2025-12-04T11:11:26.4501092Z [W1204 11:03:21.420852983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4501113Z 2025-12-04T11:11:26.4501620Z [W1204 11:03:21.552806132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4501624Z 2025-12-04T11:11:26.4502128Z [W1204 11:03:21.554519524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4502135Z 2025-12-04T11:11:26.4502648Z [W1204 11:03:21.554735479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4502652Z 2025-12-04T11:11:26.4503152Z [W1204 11:03:21.558557259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4503160Z 2025-12-04T11:11:26.4503677Z [W1204 11:03:21.559185258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4503682Z 2025-12-04T11:11:26.4504184Z [W1204 11:03:21.559375200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4504188Z 2025-12-04T11:11:26.4504701Z [W1204 11:03:21.565220358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4504708Z 2025-12-04T11:11:26.4505210Z [W1204 11:03:21.565828492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4505214Z 2025-12-04T11:11:26.4505732Z [W1204 11:03:21.566015789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4505737Z 2025-12-04T11:11:26.4506024Z ('RERUN', {'yellow': True}) [19.4420s] [100%] 2025-12-04T11:11:26.4507270Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:22.959910785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4507320Z 2025-12-04T11:11:26.4507840Z [W1204 11:03:22.960696728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4507884Z 2025-12-04T11:11:26.4508388Z [W1204 11:03:22.960902474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4508393Z 2025-12-04T11:11:26.4508913Z [W1204 11:03:22.964851671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4508918Z 2025-12-04T11:11:26.4509425Z [W1204 11:03:22.965483441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4509429Z 2025-12-04T11:11:26.4509939Z [W1204 11:03:22.965672615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4509946Z 2025-12-04T11:11:26.4510439Z [W1204 11:03:22.971752252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4510448Z 2025-12-04T11:11:26.4510957Z [W1204 11:03:22.972365182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4510962Z 2025-12-04T11:11:26.4511458Z [W1204 11:03:22.972549078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4511462Z 2025-12-04T11:11:26.4511961Z [W1204 11:03:22.058594136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4511979Z 2025-12-04T11:11:26.4512475Z [W1204 11:03:22.059351734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4512483Z 2025-12-04T11:11:26.4512977Z [W1204 11:03:22.059556938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4512984Z 2025-12-04T11:11:26.4513486Z [W1204 11:03:22.063438320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4513491Z 2025-12-04T11:11:26.4513986Z [W1204 11:03:22.064075460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4513990Z 2025-12-04T11:11:26.4514501Z [W1204 11:03:22.064269381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4514506Z 2025-12-04T11:11:26.4515005Z [W1204 11:03:22.070241698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4515011Z 2025-12-04T11:11:26.4515524Z [W1204 11:03:22.071051380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4515529Z 2025-12-04T11:11:26.4516030Z [W1204 11:03:22.071243549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4516035Z 2025-12-04T11:11:26.4516177Z ('RERUN', {'yellow': True}) [0.4673s] [100%] 2025-12-04T11:11:26.4517464Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:22.402212629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4517470Z 2025-12-04T11:11:26.4517971Z [W1204 11:03:22.402914455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4518020Z 2025-12-04T11:11:26.4518519Z [W1204 11:03:22.403107773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4518551Z 2025-12-04T11:11:26.4519050Z [W1204 11:03:22.406973279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4519055Z 2025-12-04T11:11:26.4519561Z [W1204 11:03:22.407565278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4519565Z 2025-12-04T11:11:26.4520068Z [W1204 11:03:22.407750171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4520072Z 2025-12-04T11:11:26.4520587Z [W1204 11:03:22.413775335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4520593Z 2025-12-04T11:11:26.4521091Z [W1204 11:03:22.414435087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4521097Z 2025-12-04T11:11:26.4521674Z [W1204 11:03:22.414622748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4521681Z 2025-12-04T11:11:26.4522178Z [W1204 11:03:22.500771533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4522182Z 2025-12-04T11:11:26.4522684Z [W1204 11:03:22.501486234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4522702Z 2025-12-04T11:11:26.4523196Z [W1204 11:03:22.501684645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4523203Z 2025-12-04T11:11:26.4523700Z [W1204 11:03:22.505450890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4523705Z 2025-12-04T11:11:26.4524214Z [W1204 11:03:22.506052933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4524220Z 2025-12-04T11:11:26.4524719Z [W1204 11:03:22.506257153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4524724Z 2025-12-04T11:11:26.4525237Z [W1204 11:03:22.512070114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4525241Z 2025-12-04T11:11:26.4525737Z [W1204 11:03:22.512829068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4525744Z 2025-12-04T11:11:26.4526250Z [W1204 11:03:22.513019757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4526254Z 2025-12-04T11:11:26.4526353Z FAILED [0.4395s] [100%] 2025-12-04T11:11:26.4526358Z 2025-12-04T11:11:26.4526502Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4527002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4527122Z Traceback (most recent call last): 2025-12-04T11:11:26.4527631Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4527939Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4528396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4528600Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4529123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4529338Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4529501Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4529506Z 2025-12-04T11:11:26.4529611Z Expected 1 but got 2. 2025-12-04T11:11:26.4529732Z Absolute difference: 1 2025-12-04T11:11:26.4529842Z Relative difference: 1.0 2025-12-04T11:11:26.4529847Z 2025-12-04T11:11:26.4530056Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4530960Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4530965Z 2025-12-04T11:11:26.4531228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4531463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4531578Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4532449Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4532684Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4532781Z graph_break [] 2025-12-04T11:11:26.4533004Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4534190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4534309Z if out == self.unknown_value: 2025-12-04T11:11:26.4535036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4535134Z warnings.warn( 2025-12-04T11:11:26.4535858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4535955Z warnings.warn( 2025-12-04T11:11:26.4536447Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4536581Z Traceback (most recent call last): 2025-12-04T11:11:26.4537084Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4537321Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4537765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4537928Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4538465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4538665Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4538793Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4538798Z 2025-12-04T11:11:26.4538913Z Expected 1 but got 2. 2025-12-04T11:11:26.4539017Z Absolute difference: 1 2025-12-04T11:11:26.4539135Z Relative difference: 1.0 2025-12-04T11:11:26.4539140Z 2025-12-04T11:11:26.4539406Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4540289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4540323Z 2025-12-04T11:11:26.4540597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4540812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4540966Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4541832Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4542053Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4542161Z graph_break [] 2025-12-04T11:11:26.4542379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4543578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4543694Z if out == self.unknown_value: 2025-12-04T11:11:26.4544406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4544519Z warnings.warn( 2025-12-04T11:11:26.4545221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4545316Z warnings.warn( 2025-12-04T11:11:26.4545541Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4545657Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4545892Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4546756Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4546851Z graph_break [] 2025-12-04T11:11:26.4547073Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4547782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4547890Z warnings.warn( 2025-12-04T11:11:26.4548594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4548688Z warnings.warn( 2025-12-04T11:11:26.4548846Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4549340Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4549459Z Traceback (most recent call last): 2025-12-04T11:11:26.4549967Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4550194Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4550657Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4550821Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4551344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4551615Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4551744Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4551750Z 2025-12-04T11:11:26.4551866Z Expected 1 but got 2. 2025-12-04T11:11:26.4551974Z Absolute difference: 1 2025-12-04T11:11:26.4552111Z Relative difference: 1.0 2025-12-04T11:11:26.4552116Z 2025-12-04T11:11:26.4552342Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4553220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4553256Z 2025-12-04T11:11:26.4553519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4553747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4553859Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4554752Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4554973Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4555072Z graph_break [] 2025-12-04T11:11:26.4555297Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4556470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4556599Z if out == self.unknown_value: 2025-12-04T11:11:26.4557309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4557480Z warnings.warn( 2025-12-04T11:11:26.4558295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4567894Z warnings.warn( 2025-12-04T11:11:26.4568259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4568401Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4568644Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4569564Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4569683Z graph_break [] 2025-12-04T11:11:26.4569910Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4570678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4570786Z warnings.warn( 2025-12-04T11:11:26.4571520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4571637Z warnings.warn( 2025-12-04T11:11:26.4571859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4571975Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4572225Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4573130Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4573245Z graph_break [] 2025-12-04T11:11:26.4573608Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4574475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4574630Z warnings.warn( 2025-12-04T11:11:26.4575330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4575426Z warnings.warn( 2025-12-04T11:11:26.4576293Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml - 2025-12-04T11:11:26.4576475Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4577400Z FAILED [0.4395s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4577409Z 2025-12-04T11:11:26.4577529Z Expected 1 but got 2. 2025-12-04T11:11:26.4577632Z Absolute difference: 1 2025-12-04T11:11:26.4577737Z Relative difference: 1.0 2025-12-04T11:11:26.4577759Z 2025-12-04T11:11:26.4577975Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4578855Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4578863Z 2025-12-04T11:11:26.4579140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4579319Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4579512Z ================== 1 failed, 10 deselected, 2 rerun in 20.38s ================== 2025-12-04T11:11:26.4579623Z Got exit code 1 2025-12-04T11:11:26.4579734Z Retrying single test... 2025-12-04T11:11:26.4580188Z W1204 11:03:33.912000 91053 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4580837Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml 2025-12-04T11:11:26.4581002Z ============================= test session starts ============================== 2025-12-04T11:11:26.4581360Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4581468Z cachedir: .pytest_cache 2025-12-04T11:11:26.4581994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4582116Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4582221Z configfile: pytest.ini 2025-12-04T11:11:26.4582765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4582982Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4583941Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4584069Z Running 1 items in this shard 2025-12-04T11:11:26.4584074Z 2025-12-04T11:11:26.4585318Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:39.870758741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4585324Z 2025-12-04T11:11:26.4585915Z [W1204 11:03:54.427769773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4585921Z 2025-12-04T11:11:26.4586426Z [W1204 11:03:54.428025975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4586463Z 2025-12-04T11:11:26.4586976Z [W1204 11:03:54.435202931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4586981Z 2025-12-04T11:11:26.4587478Z [W1204 11:03:54.435908744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4587516Z 2025-12-04T11:11:26.4588028Z [W1204 11:03:54.436097008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4588032Z 2025-12-04T11:11:26.4588529Z [W1204 11:03:54.442828957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4588539Z 2025-12-04T11:11:26.4589052Z [W1204 11:03:54.443571676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4589057Z 2025-12-04T11:11:26.4589556Z [W1204 11:03:54.443751494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4589561Z 2025-12-04T11:11:26.4590060Z [W1204 11:03:54.575144107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4590067Z 2025-12-04T11:11:26.4590583Z [W1204 11:03:54.576850952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4590588Z 2025-12-04T11:11:26.4591090Z [W1204 11:03:54.577054720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4591094Z 2025-12-04T11:11:26.4591611Z [W1204 11:03:54.580937635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4591615Z 2025-12-04T11:11:26.4592115Z [W1204 11:03:54.581568770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4592122Z 2025-12-04T11:11:26.4592633Z [W1204 11:03:54.581761560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4592639Z 2025-12-04T11:11:26.4593136Z [W1204 11:03:54.587707393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4593140Z 2025-12-04T11:11:26.4593651Z [W1204 11:03:54.588362514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4593656Z 2025-12-04T11:11:26.4594155Z [W1204 11:03:54.588551033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4594160Z 2025-12-04T11:11:26.4594289Z ('RERUN', {'yellow': True}) [19.3430s] [100%] 2025-12-04T11:11:26.4595541Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:55.978981816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4595549Z 2025-12-04T11:11:26.4596049Z [W1204 11:03:55.979701023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4596053Z 2025-12-04T11:11:26.4596562Z [W1204 11:03:55.979892191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4596567Z 2025-12-04T11:11:26.4597134Z [W1204 11:03:55.983766657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4597139Z 2025-12-04T11:11:26.4597654Z [W1204 11:03:55.984366204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4597687Z 2025-12-04T11:11:26.4598182Z [W1204 11:03:55.984551403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4598304Z 2025-12-04T11:11:26.4598814Z [W1204 11:03:55.990481650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4598819Z 2025-12-04T11:11:26.4599314Z [W1204 11:03:55.991081437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4599319Z 2025-12-04T11:11:26.4599837Z [W1204 11:03:55.991265748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4599841Z 2025-12-04T11:11:26.4600339Z [W1204 11:03:55.076311372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4600345Z 2025-12-04T11:11:26.4601062Z [W1204 11:03:55.077047215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4601081Z 2025-12-04T11:11:26.4601640Z [W1204 11:03:55.077253938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4601646Z 2025-12-04T11:11:26.4602143Z [W1204 11:03:55.081107706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4602148Z 2025-12-04T11:11:26.4602661Z [W1204 11:03:55.081727993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4602666Z 2025-12-04T11:11:26.4603165Z [W1204 11:03:55.081918302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4603172Z 2025-12-04T11:11:26.4603682Z [W1204 11:03:55.087821826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4603687Z 2025-12-04T11:11:26.4604189Z [W1204 11:03:55.088580956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4604193Z 2025-12-04T11:11:26.4604704Z [W1204 11:03:55.088768053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4604709Z 2025-12-04T11:11:26.4604834Z ('RERUN', {'yellow': True}) [0.4607s] [100%] 2025-12-04T11:11:26.4606075Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:55.423006532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4606095Z 2025-12-04T11:11:26.4606591Z [W1204 11:03:55.423708090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4606595Z 2025-12-04T11:11:26.4607094Z [W1204 11:03:55.423902837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4607099Z 2025-12-04T11:11:26.4607604Z [W1204 11:03:55.427842342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4607608Z 2025-12-04T11:11:26.4608249Z [W1204 11:03:55.428440610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4608255Z 2025-12-04T11:11:26.4608767Z [W1204 11:03:55.428624855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4608835Z 2025-12-04T11:11:26.4609334Z [W1204 11:03:55.434628162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4609339Z 2025-12-04T11:11:26.4609850Z [W1204 11:03:55.435223641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4609903Z 2025-12-04T11:11:26.4610400Z [W1204 11:03:55.435406363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4610405Z 2025-12-04T11:11:26.4610922Z [W1204 11:03:55.520618941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4610926Z 2025-12-04T11:11:26.4611426Z [W1204 11:03:55.521370834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4611433Z 2025-12-04T11:11:26.4611930Z [W1204 11:03:55.521570526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4611948Z 2025-12-04T11:11:26.4612439Z [W1204 11:03:55.525424699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4612446Z 2025-12-04T11:11:26.4612940Z [W1204 11:03:55.526031759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4612945Z 2025-12-04T11:11:26.4613451Z [W1204 11:03:55.526233893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4613459Z 2025-12-04T11:11:26.4613952Z [W1204 11:03:55.532216985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4613957Z 2025-12-04T11:11:26.4614466Z [W1204 11:03:55.533061159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4614471Z 2025-12-04T11:11:26.4614963Z [W1204 11:03:55.533255605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4614970Z 2025-12-04T11:11:26.4615082Z FAILED [0.4438s] [100%] 2025-12-04T11:11:26.4615087Z 2025-12-04T11:11:26.4615230Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4615719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4615855Z Traceback (most recent call last): 2025-12-04T11:11:26.4616356Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4616598Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4617059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4617218Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4617757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4617960Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4618086Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4618103Z 2025-12-04T11:11:26.4618206Z Expected 1 but got 2. 2025-12-04T11:11:26.4618310Z Absolute difference: 1 2025-12-04T11:11:26.4618432Z Relative difference: 1.0 2025-12-04T11:11:26.4618437Z 2025-12-04T11:11:26.4618705Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4619583Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4619617Z 2025-12-04T11:11:26.4619894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4620110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4620268Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4621140Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4621359Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4621463Z graph_break [] 2025-12-04T11:11:26.4621677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4622867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4622983Z if out == self.unknown_value: 2025-12-04T11:11:26.4623699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4623808Z warnings.warn( 2025-12-04T11:11:26.4624513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4624621Z warnings.warn( 2025-12-04T11:11:26.4625116Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4625236Z Traceback (most recent call last): 2025-12-04T11:11:26.4625747Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4625975Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4626424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4626603Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4627127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4627345Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4627473Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4627478Z 2025-12-04T11:11:26.4627580Z Expected 1 but got 2. 2025-12-04T11:11:26.4627704Z Absolute difference: 1 2025-12-04T11:11:26.4627812Z Relative difference: 1.0 2025-12-04T11:11:26.4627816Z 2025-12-04T11:11:26.4628028Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4628921Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4628927Z 2025-12-04T11:11:26.4629192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4629426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4629542Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4630413Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4630706Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4630803Z graph_break [] 2025-12-04T11:11:26.4631028Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4632237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4632380Z if out == self.unknown_value: 2025-12-04T11:11:26.4633100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4633199Z warnings.warn( 2025-12-04T11:11:26.4633921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4634023Z warnings.warn( 2025-12-04T11:11:26.4634237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4634360Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4634586Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4635466Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4635563Z graph_break [] 2025-12-04T11:11:26.4635773Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4636497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4636593Z warnings.warn( 2025-12-04T11:11:26.4637297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4637404Z warnings.warn( 2025-12-04T11:11:26.4637547Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4638054Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4638171Z Traceback (most recent call last): 2025-12-04T11:11:26.4638669Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4638908Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4639356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4639531Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4640063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4640263Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4640408Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4640414Z 2025-12-04T11:11:26.4640519Z Expected 1 but got 2. 2025-12-04T11:11:26.4640622Z Absolute difference: 1 2025-12-04T11:11:26.4640744Z Relative difference: 1.0 2025-12-04T11:11:26.4640748Z 2025-12-04T11:11:26.4640962Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4642039Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4642047Z 2025-12-04T11:11:26.4642312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4642607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4642735Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4643600Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4643876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4643972Z graph_break [] 2025-12-04T11:11:26.4644216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4645410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4645523Z if out == self.unknown_value: 2025-12-04T11:11:26.4646249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4646345Z warnings.warn( 2025-12-04T11:11:26.4647052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4647166Z warnings.warn( 2025-12-04T11:11:26.4647379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4647494Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4647730Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4648597Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4648705Z graph_break [] 2025-12-04T11:11:26.4648919Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4649629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4649747Z warnings.warn( 2025-12-04T11:11:26.4650449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4650563Z warnings.warn( 2025-12-04T11:11:26.4650771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4650883Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4651117Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4651988Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4652083Z graph_break [] 2025-12-04T11:11:26.4652305Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4653013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4653122Z warnings.warn( 2025-12-04T11:11:26.4653822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4653921Z warnings.warn( 2025-12-04T11:11:26.4654758Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml - 2025-12-04T11:11:26.4654927Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4655938Z FAILED [0.4438s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4655972Z 2025-12-04T11:11:26.4656077Z Expected 1 but got 2. 2025-12-04T11:11:26.4656180Z Absolute difference: 1 2025-12-04T11:11:26.4656299Z Relative difference: 1.0 2025-12-04T11:11:26.4656304Z 2025-12-04T11:11:26.4656515Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4657529Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4657537Z 2025-12-04T11:11:26.4657899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4658158Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4658416Z ================== 1 failed, 10 deselected, 2 rerun in 20.28s ================== 2025-12-04T11:11:26.4658514Z Got exit code 1 2025-12-04T11:11:26.4659415Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4659901Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.4660344Z W1204 11:04:06.954000 91227 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4660997Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml 2025-12-04T11:11:26.4661160Z ============================= test session starts ============================== 2025-12-04T11:11:26.4661521Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4661630Z cachedir: .pytest_cache 2025-12-04T11:11:26.4662143Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4662280Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4662385Z configfile: pytest.ini 2025-12-04T11:11:26.4662913Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4663139Z collecting ... collected 58 items / 6 deselected / 52 selected 2025-12-04T11:11:26.4663277Z stepcurrent: skipping 6 already run items. 2025-12-04T11:11:26.4663401Z Running 5 items in this shard 2025-12-04T11:11:26.4663406Z 2025-12-04T11:11:26.4664268Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.1938s] [ 20%] 2025-12-04T11:11:26.4665121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8687s] [ 20%] 2025-12-04T11:11:26.4665900Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.8693s] [ 20%] 2025-12-04T11:11:26.4665906Z 2025-12-04T11:11:26.4666049Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4666569Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4666689Z Traceback (most recent call last): 2025-12-04T11:11:26.4667191Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4667536Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4667996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4668174Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4668735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4668939Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4669122Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4669127Z 2025-12-04T11:11:26.4669234Z Expected 1 but got 2. 2025-12-04T11:11:26.4669338Z Absolute difference: 1 2025-12-04T11:11:26.4669457Z Relative difference: 1.0 2025-12-04T11:11:26.4669462Z 2025-12-04T11:11:26.4669670Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4670575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4670581Z 2025-12-04T11:11:26.4670842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4671060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4671186Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4671704Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4671941Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4672036Z graph_break [] 2025-12-04T11:11:26.4672247Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4672983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4673083Z warnings.warn( 2025-12-04T11:11:26.4673804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4673904Z warnings.warn( 2025-12-04T11:11:26.4674409Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4674541Z Traceback (most recent call last): 2025-12-04T11:11:26.4675046Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4675273Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4675739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4675897Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4676435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4676639Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4676769Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4676774Z 2025-12-04T11:11:26.4676886Z Expected 1 but got 2. 2025-12-04T11:11:26.4676990Z Absolute difference: 1 2025-12-04T11:11:26.4677095Z Relative difference: 1.0 2025-12-04T11:11:26.4677111Z 2025-12-04T11:11:26.4677324Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4678215Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4678221Z 2025-12-04T11:11:26.4678494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4678788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4678904Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4679434Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4679689Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4679797Z graph_break [] 2025-12-04T11:11:26.4680013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4680765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4680878Z warnings.warn( 2025-12-04T11:11:26.4681706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4681828Z warnings.warn( 2025-12-04T11:11:26.4682042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4682157Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4682400Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4682914Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4683009Z graph_break [] 2025-12-04T11:11:26.4683241Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4683949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4684066Z warnings.warn( 2025-12-04T11:11:26.4684772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4684870Z warnings.warn( 2025-12-04T11:11:26.4685024Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4685531Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4685650Z Traceback (most recent call last): 2025-12-04T11:11:26.4686162Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4686391Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4686849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4687009Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4687538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4687750Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4687878Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4687883Z 2025-12-04T11:11:26.4687999Z Expected 1 but got 2. 2025-12-04T11:11:26.4688104Z Absolute difference: 1 2025-12-04T11:11:26.4688210Z Relative difference: 1.0 2025-12-04T11:11:26.4688215Z 2025-12-04T11:11:26.4688436Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4689341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4689346Z 2025-12-04T11:11:26.4689622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4689836Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4690026Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4690560Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4690814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4690908Z graph_break [] 2025-12-04T11:11:26.4691129Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4691844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4691990Z warnings.warn( 2025-12-04T11:11:26.4692699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4692796Z warnings.warn( 2025-12-04T11:11:26.4693023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4693136Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4693358Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4693892Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4693992Z graph_break [] 2025-12-04T11:11:26.4694217Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4694923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4695019Z warnings.warn( 2025-12-04T11:11:26.4695732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4695829Z warnings.warn( 2025-12-04T11:11:26.4696054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4696166Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4696389Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4696928Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4697022Z graph_break [] 2025-12-04T11:11:26.4697234Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4697953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4698048Z warnings.warn( 2025-12-04T11:11:26.4698766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4698862Z warnings.warn( 2025-12-04T11:11:26.4699684Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml - 2025-12-04T11:11:26.4699866Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4700787Z FAILED [0.8693s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4700795Z 2025-12-04T11:11:26.4701093Z Expected 1 but got 2. 2025-12-04T11:11:26.4701209Z Absolute difference: 1 2025-12-04T11:11:26.4701331Z Relative difference: 1.0 2025-12-04T11:11:26.4701336Z 2025-12-04T11:11:26.4701552Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4702600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4702606Z 2025-12-04T11:11:26.4702890Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4703117Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4703327Z =================== 1 failed, 6 deselected, 2 rerun in 5.96s =================== 2025-12-04T11:11:26.4703467Z Got exit code 1 2025-12-04T11:11:26.4703574Z Retrying single test... 2025-12-04T11:11:26.4704031Z W1204 11:04:27.382000 91404 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4704677Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml 2025-12-04T11:11:26.4704841Z ============================= test session starts ============================== 2025-12-04T11:11:26.4705200Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4705309Z cachedir: .pytest_cache 2025-12-04T11:11:26.4705841Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4705966Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4706073Z configfile: pytest.ini 2025-12-04T11:11:26.4706616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4706834Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4707812Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4707945Z Running 1 items in this shard 2025-12-04T11:11:26.4707952Z 2025-12-04T11:11:26.4709216Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:31.000323376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4709224Z 2025-12-04T11:11:26.4709750Z [W1204 11:04:46.196910160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4709757Z 2025-12-04T11:11:26.4710262Z [W1204 11:04:46.197171271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4710267Z 2025-12-04T11:11:26.4710785Z [W1204 11:04:46.204393783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4710790Z 2025-12-04T11:11:26.4711294Z [W1204 11:04:46.205117380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4711298Z 2025-12-04T11:11:26.4711811Z [W1204 11:04:46.205303649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4711818Z 2025-12-04T11:11:26.4712312Z [W1204 11:04:46.212085762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4712319Z 2025-12-04T11:11:26.4712829Z [W1204 11:04:46.212726105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4712834Z 2025-12-04T11:11:26.4713326Z [W1204 11:04:46.212908166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4713331Z 2025-12-04T11:11:26.4713898Z [W1204 11:04:48.157651997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4713915Z 2025-12-04T11:11:26.4714414Z [W1204 11:04:48.159427203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4714449Z 2025-12-04T11:11:26.4714944Z [W1204 11:04:48.159642979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4714981Z 2025-12-04T11:11:26.4715495Z [W1204 11:04:48.163627380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4715499Z 2025-12-04T11:11:26.4715996Z [W1204 11:04:48.164285044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4716001Z 2025-12-04T11:11:26.4716512Z [W1204 11:04:48.164482973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4716516Z 2025-12-04T11:11:26.4717013Z [W1204 11:04:48.170476830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4717020Z 2025-12-04T11:11:26.4717532Z [W1204 11:04:48.171132422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4717539Z 2025-12-04T11:11:26.4718035Z [W1204 11:04:48.171324593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4718040Z 2025-12-04T11:11:26.4718184Z ('RERUN', {'yellow': True}) [19.4180s] [100%] 2025-12-04T11:11:26.4719441Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:49.988740256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4719447Z 2025-12-04T11:11:26.4719947Z [W1204 11:04:49.989502499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4719966Z 2025-12-04T11:11:26.4720460Z [W1204 11:04:49.989698741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4720467Z 2025-12-04T11:11:26.4720967Z [W1204 11:04:49.993612644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4720971Z 2025-12-04T11:11:26.4721541Z [W1204 11:04:49.994441634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4721549Z 2025-12-04T11:11:26.4722051Z [W1204 11:04:49.994631138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4722055Z 2025-12-04T11:11:26.4722565Z [W1204 11:04:49.000614473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4722572Z 2025-12-04T11:11:26.4723069Z [W1204 11:04:49.001266611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4723074Z 2025-12-04T11:11:26.4723589Z [W1204 11:04:49.001451465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4723593Z 2025-12-04T11:11:26.4724087Z [W1204 11:04:49.088418889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4724091Z 2025-12-04T11:11:26.4724666Z [W1204 11:04:49.089219896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4724685Z 2025-12-04T11:11:26.4725183Z [W1204 11:04:49.089426241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4725217Z 2025-12-04T11:11:26.4725720Z [W1204 11:04:49.093407415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4725726Z 2025-12-04T11:11:26.4726238Z [W1204 11:04:49.094069794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4726272Z 2025-12-04T11:11:26.4726771Z [W1204 11:04:49.094277051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4726775Z 2025-12-04T11:11:26.4727289Z [W1204 11:04:49.100235056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4727294Z 2025-12-04T11:11:26.4727787Z [W1204 11:04:49.101077542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4727795Z 2025-12-04T11:11:26.4728306Z [W1204 11:04:49.101267821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4728311Z 2025-12-04T11:11:26.4728442Z ('RERUN', {'yellow': True}) [0.8908s] [100%] 2025-12-04T11:11:26.4729715Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:50.864116671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4729720Z 2025-12-04T11:11:26.4730223Z [W1204 11:04:50.864883787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4730228Z 2025-12-04T11:11:26.4730723Z [W1204 11:04:50.865080111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4730743Z 2025-12-04T11:11:26.4731237Z [W1204 11:04:50.868993425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4731242Z 2025-12-04T11:11:26.4731740Z [W1204 11:04:50.869684322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4731746Z 2025-12-04T11:11:26.4732258Z [W1204 11:04:50.869877254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4732263Z 2025-12-04T11:11:26.4732769Z [W1204 11:04:50.875946421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4732773Z 2025-12-04T11:11:26.4733285Z [W1204 11:04:50.876615441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4733291Z 2025-12-04T11:11:26.4733790Z [W1204 11:04:50.876799963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4733795Z 2025-12-04T11:11:26.4734307Z [W1204 11:04:50.962488634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4734314Z 2025-12-04T11:11:26.4734812Z [W1204 11:04:50.963248245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4734816Z 2025-12-04T11:11:26.4735327Z [W1204 11:04:50.963448590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4735394Z 2025-12-04T11:11:26.4735898Z [W1204 11:04:50.967268727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4735902Z 2025-12-04T11:11:26.4736436Z [W1204 11:04:50.967886517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4736440Z 2025-12-04T11:11:26.4736955Z [W1204 11:04:50.968078942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4737001Z 2025-12-04T11:11:26.4737504Z [W1204 11:04:50.973942722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4737509Z 2025-12-04T11:11:26.4738019Z [W1204 11:04:50.974741331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4738024Z 2025-12-04T11:11:26.4738528Z [W1204 11:04:50.974931402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4738533Z 2025-12-04T11:11:26.4738645Z FAILED [0.8705s] [100%] 2025-12-04T11:11:26.4738652Z 2025-12-04T11:11:26.4738793Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4739298Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4739433Z Traceback (most recent call last): 2025-12-04T11:11:26.4739938Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4740179Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4740638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4740803Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4741342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4741544Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4741685Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4741690Z 2025-12-04T11:11:26.4741792Z Expected 1 but got 2. 2025-12-04T11:11:26.4741897Z Absolute difference: 1 2025-12-04T11:11:26.4742017Z Relative difference: 1.0 2025-12-04T11:11:26.4742025Z 2025-12-04T11:11:26.4742236Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4743131Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4743149Z 2025-12-04T11:11:26.4743414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4743634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4743763Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4744287Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4744514Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4744622Z graph_break [] 2025-12-04T11:11:26.4744834Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4746037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4746149Z if out == self.unknown_value: 2025-12-04T11:11:26.4746927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4747043Z warnings.warn( 2025-12-04T11:11:26.4747747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4747890Z warnings.warn( 2025-12-04T11:11:26.4748394Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4748542Z Traceback (most recent call last): 2025-12-04T11:11:26.4749052Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4749280Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4749734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4749905Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4750428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4750641Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4750767Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4750772Z 2025-12-04T11:11:26.4750877Z Expected 1 but got 2. 2025-12-04T11:11:26.4750996Z Absolute difference: 1 2025-12-04T11:11:26.4751106Z Relative difference: 1.0 2025-12-04T11:11:26.4751111Z 2025-12-04T11:11:26.4751319Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4752228Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4752233Z 2025-12-04T11:11:26.4752498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4752726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4752839Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4753355Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4753592Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4753691Z graph_break [] 2025-12-04T11:11:26.4753916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4755092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4755206Z if out == self.unknown_value: 2025-12-04T11:11:26.4755931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4756029Z warnings.warn( 2025-12-04T11:11:26.4756749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4756846Z warnings.warn( 2025-12-04T11:11:26.4757055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4757182Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4757405Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4757918Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4758023Z graph_break [] 2025-12-04T11:11:26.4758295Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4759014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4759141Z warnings.warn( 2025-12-04T11:11:26.4759845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4759983Z warnings.warn( 2025-12-04T11:11:26.4760125Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4760641Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4760760Z Traceback (most recent call last): 2025-12-04T11:11:26.4761261Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4761561Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4762011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4762176Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4762715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4762916Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4763062Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4763067Z 2025-12-04T11:11:26.4763167Z Expected 1 but got 2. 2025-12-04T11:11:26.4763272Z Absolute difference: 1 2025-12-04T11:11:26.4763392Z Relative difference: 1.0 2025-12-04T11:11:26.4763396Z 2025-12-04T11:11:26.4763606Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4764523Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4764528Z 2025-12-04T11:11:26.4764851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4765066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4765192Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4765709Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4765931Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4766040Z graph_break [] 2025-12-04T11:11:26.4766252Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4767453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4767567Z if out == self.unknown_value: 2025-12-04T11:11:26.4768275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4768386Z warnings.warn( 2025-12-04T11:11:26.4769088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4769199Z warnings.warn( 2025-12-04T11:11:26.4769412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4769522Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4769759Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4770340Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4770437Z graph_break [] 2025-12-04T11:11:26.4770659Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4771396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4771541Z warnings.warn( 2025-12-04T11:11:26.4772246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4772342Z warnings.warn( 2025-12-04T11:11:26.4772566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4772679Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4772918Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4773432Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4773530Z graph_break [] 2025-12-04T11:11:26.4773759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4774464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4774565Z warnings.warn( 2025-12-04T11:11:26.4775284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4775380Z warnings.warn( 2025-12-04T11:11:26.4776215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml - 2025-12-04T11:11:26.4776387Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4777313Z FAILED [0.8705s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4777335Z 2025-12-04T11:11:26.4777438Z Expected 1 but got 2. 2025-12-04T11:11:26.4777543Z Absolute difference: 1 2025-12-04T11:11:26.4777668Z Relative difference: 1.0 2025-12-04T11:11:26.4777673Z 2025-12-04T11:11:26.4777889Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4778783Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4778789Z 2025-12-04T11:11:26.4779073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4779252Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4779465Z ================== 1 failed, 10 deselected, 2 rerun in 21.21s ================== 2025-12-04T11:11:26.4779619Z Got exit code 1 2025-12-04T11:11:26.4779725Z Retrying single test... 2025-12-04T11:11:26.4780180Z W1204 11:05:01.816000 91586 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4780832Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml 2025-12-04T11:11:26.4781011Z ============================= test session starts ============================== 2025-12-04T11:11:26.4781353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4781461Z cachedir: .pytest_cache 2025-12-04T11:11:26.4782061Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4782187Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4782291Z configfile: pytest.ini 2025-12-04T11:11:26.4782966Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4783183Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4784205Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4784319Z Running 1 items in this shard 2025-12-04T11:11:26.4784324Z 2025-12-04T11:11:26.4785587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:05.431409412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4785607Z 2025-12-04T11:11:26.4786117Z [W1204 11:05:20.612989526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4786124Z 2025-12-04T11:11:26.4786627Z [W1204 11:05:20.613247368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4786635Z 2025-12-04T11:11:26.4787146Z [W1204 11:05:21.620638634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4787151Z 2025-12-04T11:11:26.4787653Z [W1204 11:05:21.621390514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4787658Z 2025-12-04T11:11:26.4788172Z [W1204 11:05:21.621581028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4788177Z 2025-12-04T11:11:26.4788673Z [W1204 11:05:21.628482846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4788680Z 2025-12-04T11:11:26.4789193Z [W1204 11:05:21.629152339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4789200Z 2025-12-04T11:11:26.4789695Z [W1204 11:05:21.629334412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4789700Z 2025-12-04T11:11:26.4790205Z [W1204 11:05:22.578988189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4790210Z 2025-12-04T11:11:26.4790714Z [W1204 11:05:22.580795907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4790719Z 2025-12-04T11:11:26.4791216Z [W1204 11:05:22.581014613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4791238Z 2025-12-04T11:11:26.4791732Z [W1204 11:05:22.585065347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4791738Z 2025-12-04T11:11:26.4792233Z [W1204 11:05:22.585761585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4792238Z 2025-12-04T11:11:26.4792743Z [W1204 11:05:22.585961679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4792748Z 2025-12-04T11:11:26.4793316Z [W1204 11:05:22.592274069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4793322Z 2025-12-04T11:11:26.4793832Z [W1204 11:05:22.593010626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4793867Z 2025-12-04T11:11:26.4794365Z [W1204 11:05:22.593211046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4794370Z 2025-12-04T11:11:26.4794539Z ('RERUN', {'yellow': True}) [19.4085s] [100%] 2025-12-04T11:11:26.4795791Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:23.403142050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4795797Z 2025-12-04T11:11:26.4796298Z [W1204 11:05:23.403907558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4796315Z 2025-12-04T11:11:26.4796814Z [W1204 11:05:23.404110041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4796821Z 2025-12-04T11:11:26.4797319Z [W1204 11:05:23.408049590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4797323Z 2025-12-04T11:11:26.4797835Z [W1204 11:05:23.408835615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4797839Z 2025-12-04T11:11:26.4798339Z [W1204 11:05:23.409024086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4798344Z 2025-12-04T11:11:26.4798858Z [W1204 11:05:23.415015354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4798863Z 2025-12-04T11:11:26.4799358Z [W1204 11:05:23.415645931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4799365Z 2025-12-04T11:11:26.4799872Z [W1204 11:05:23.415831552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4799877Z 2025-12-04T11:11:26.4800374Z [W1204 11:05:23.500886908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4800382Z 2025-12-04T11:11:26.4801115Z [W1204 11:05:23.501603822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4801121Z 2025-12-04T11:11:26.4801672Z [W1204 11:05:23.501803129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4801678Z 2025-12-04T11:11:26.4802176Z [W1204 11:05:23.505621832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4802198Z 2025-12-04T11:11:26.4802693Z [W1204 11:05:23.506242305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4802698Z 2025-12-04T11:11:26.4803192Z [W1204 11:05:23.506432061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4803199Z 2025-12-04T11:11:26.4803710Z [W1204 11:05:23.512355703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4803714Z 2025-12-04T11:11:26.4804213Z [W1204 11:05:23.513130560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4804348Z 2025-12-04T11:11:26.4804860Z [W1204 11:05:23.513319968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4804864Z 2025-12-04T11:11:26.4805036Z ('RERUN', {'yellow': True}) [0.8798s] [100%] 2025-12-04T11:11:26.4806303Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:24.267476347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4806351Z 2025-12-04T11:11:26.4806850Z [W1204 11:05:24.268263720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4806855Z 2025-12-04T11:11:26.4807367Z [W1204 11:05:24.268464759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4807372Z 2025-12-04T11:11:26.4807871Z [W1204 11:05:24.272494294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4807877Z 2025-12-04T11:11:26.4808377Z [W1204 11:05:24.273176563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4808394Z 2025-12-04T11:11:26.4808893Z [W1204 11:05:24.273372890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4808900Z 2025-12-04T11:11:26.4809397Z [W1204 11:05:24.279395955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4809402Z 2025-12-04T11:11:26.4809910Z [W1204 11:05:24.280081463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4809918Z 2025-12-04T11:11:26.4810416Z [W1204 11:05:24.280275795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4810420Z 2025-12-04T11:11:26.4810935Z [W1204 11:05:24.370255688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4810940Z 2025-12-04T11:11:26.4811435Z [W1204 11:05:24.371073370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4811442Z 2025-12-04T11:11:26.4811955Z [W1204 11:05:24.371287641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4811960Z 2025-12-04T11:11:26.4812455Z [W1204 11:05:24.375288327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4812459Z 2025-12-04T11:11:26.4812960Z [W1204 11:05:24.375971980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4812978Z 2025-12-04T11:11:26.4813475Z [W1204 11:05:24.376176547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4813482Z 2025-12-04T11:11:26.4813979Z [W1204 11:05:24.382269564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4813985Z 2025-12-04T11:11:26.4814496Z [W1204 11:05:24.383153170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4814501Z 2025-12-04T11:11:26.4814998Z [W1204 11:05:24.383353443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4815003Z 2025-12-04T11:11:26.4815115Z FAILED [0.8703s] [100%] 2025-12-04T11:11:26.4815184Z 2025-12-04T11:11:26.4815329Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4815847Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4815999Z Traceback (most recent call last): 2025-12-04T11:11:26.4816503Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4816775Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4817227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4817389Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4817927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4818133Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4818277Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4818282Z 2025-12-04T11:11:26.4818383Z Expected 1 but got 2. 2025-12-04T11:11:26.4818490Z Absolute difference: 1 2025-12-04T11:11:26.4818612Z Relative difference: 1.0 2025-12-04T11:11:26.4818617Z 2025-12-04T11:11:26.4818828Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4819722Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4819741Z 2025-12-04T11:11:26.4820002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4820216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4820342Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4820861Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4821080Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4821191Z graph_break [] 2025-12-04T11:11:26.4821400Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4822594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4822710Z if out == self.unknown_value: 2025-12-04T11:11:26.4823419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4823529Z warnings.warn( 2025-12-04T11:11:26.4824237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4824346Z warnings.warn( 2025-12-04T11:11:26.4824851Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4824971Z Traceback (most recent call last): 2025-12-04T11:11:26.4825480Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4825708Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4826168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4826326Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4826922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4827139Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4827267Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4827272Z 2025-12-04T11:11:26.4827404Z Expected 1 but got 2. 2025-12-04T11:11:26.4827520Z Absolute difference: 1 2025-12-04T11:11:26.4827626Z Relative difference: 1.0 2025-12-04T11:11:26.4827631Z 2025-12-04T11:11:26.4827854Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4828746Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4828780Z 2025-12-04T11:11:26.4829042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4829270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4829385Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4829913Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4830135Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4830233Z graph_break [] 2025-12-04T11:11:26.4830458Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4831633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4831748Z if out == self.unknown_value: 2025-12-04T11:11:26.4832472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4832574Z warnings.warn( 2025-12-04T11:11:26.4833296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4833396Z warnings.warn( 2025-12-04T11:11:26.4833608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4833732Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4833954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4834484Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4834577Z graph_break [] 2025-12-04T11:11:26.4834785Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4835699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4835801Z warnings.warn( 2025-12-04T11:11:26.4836510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4836626Z warnings.warn( 2025-12-04T11:11:26.4836767Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4837286Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.4837407Z Traceback (most recent call last): 2025-12-04T11:11:26.4837907Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4838149Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4838667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4838846Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4839367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4839599Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4839739Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4839744Z 2025-12-04T11:11:26.4839846Z Expected 1 but got 2. 2025-12-04T11:11:26.4839981Z Absolute difference: 1 2025-12-04T11:11:26.4840100Z Relative difference: 1.0 2025-12-04T11:11:26.4840105Z 2025-12-04T11:11:26.4840314Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4841220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4841226Z 2025-12-04T11:11:26.4841570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4841788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4841917Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4842440Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4842675Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4842772Z graph_break [] 2025-12-04T11:11:26.4842988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4844189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4844310Z if out == self.unknown_value: 2025-12-04T11:11:26.4845050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4845156Z warnings.warn( 2025-12-04T11:11:26.4845868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4845984Z warnings.warn( 2025-12-04T11:11:26.4846199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4846315Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4846557Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4847078Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4847188Z graph_break [] 2025-12-04T11:11:26.4847404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4848118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4848234Z warnings.warn( 2025-12-04T11:11:26.4848942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4849054Z warnings.warn( 2025-12-04T11:11:26.4849270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4849382Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4849622Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4850144Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.4850313Z graph_break [] 2025-12-04T11:11:26.4850539Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4851251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4851394Z warnings.warn( 2025-12-04T11:11:26.4852103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4852232Z warnings.warn( 2025-12-04T11:11:26.4853068Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml - 2025-12-04T11:11:26.4853237Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4854188Z FAILED [0.8703s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4854194Z 2025-12-04T11:11:26.4854298Z Expected 1 but got 2. 2025-12-04T11:11:26.4854406Z Absolute difference: 1 2025-12-04T11:11:26.4854526Z Relative difference: 1.0 2025-12-04T11:11:26.4854531Z 2025-12-04T11:11:26.4854744Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4855632Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4855652Z 2025-12-04T11:11:26.4855911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4856086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4856299Z ================== 1 failed, 10 deselected, 2 rerun in 21.19s ================== 2025-12-04T11:11:26.4856394Z Got exit code 1 2025-12-04T11:11:26.4857203Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.4857622Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.4858054Z W1204 11:05:35.814000 91768 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4858713Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml 2025-12-04T11:11:26.4858873Z ============================= test session starts ============================== 2025-12-04T11:11:26.4859215Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4859339Z cachedir: .pytest_cache 2025-12-04T11:11:26.4859850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4859983Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4860089Z configfile: pytest.ini 2025-12-04T11:11:26.4860616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4860843Z collecting ... collected 58 items / 7 deselected / 51 selected 2025-12-04T11:11:26.4860981Z stepcurrent: skipping 7 already run items. 2025-12-04T11:11:26.4861091Z Running 4 items in this shard 2025-12-04T11:11:26.4861109Z 2025-12-04T11:11:26.4862359Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:05:41.310000 91768 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4862490Z ('RERUN', {'yellow': True}) [3.8797s] [ 25%] 2025-12-04T11:11:26.4863344Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5074s] [ 25%] 2025-12-04T11:11:26.4864140Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5146s] [ 25%] 2025-12-04T11:11:26.4864177Z 2025-12-04T11:11:26.4864328Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4864825Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4864944Z Traceback (most recent call last): 2025-12-04T11:11:26.4865459Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4865685Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4866155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4866320Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4866843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4867063Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4867193Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4867198Z 2025-12-04T11:11:26.4867302Z Expected 1 but got 0. 2025-12-04T11:11:26.4867417Z Absolute difference: 1 2025-12-04T11:11:26.4867523Z Relative difference: 1.0 2025-12-04T11:11:26.4867528Z 2025-12-04T11:11:26.4867747Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4868634Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4868640Z 2025-12-04T11:11:26.4868904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4869129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4869241Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4869937Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4870161Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4870256Z graph_break [] 2025-12-04T11:11:26.4870384Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4870601Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4871320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4871433Z warnings.warn( 2025-12-04T11:11:26.4872141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4872251Z warnings.warn( 2025-12-04T11:11:26.4872746Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4872866Z Traceback (most recent call last): 2025-12-04T11:11:26.4873376Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4873603Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4874150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4874312Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4874836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4875077Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4875207Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4875212Z 2025-12-04T11:11:26.4875345Z Expected 1 but got 0. 2025-12-04T11:11:26.4875463Z Absolute difference: 1 2025-12-04T11:11:26.4875570Z Relative difference: 1.0 2025-12-04T11:11:26.4875575Z 2025-12-04T11:11:26.4875800Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4876693Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4876702Z 2025-12-04T11:11:26.4876963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4877194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4877310Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4878006Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4878228Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4878323Z graph_break [] 2025-12-04T11:11:26.4878454Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4878664Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4879385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4879495Z warnings.warn( 2025-12-04T11:11:26.4880206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4880320Z warnings.warn( 2025-12-04T11:11:26.4880531Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4880642Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4880876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4881618Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4881733Z graph_break [] 2025-12-04T11:11:26.4881852Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4882063Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4882792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4882890Z warnings.warn( 2025-12-04T11:11:26.4883593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4883705Z warnings.warn( 2025-12-04T11:11:26.4883848Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4884367Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4884485Z Traceback (most recent call last): 2025-12-04T11:11:26.4884982Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4885225Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4885754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4885916Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4886483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4886684Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4886825Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4886860Z 2025-12-04T11:11:26.4886960Z Expected 1 but got 0. 2025-12-04T11:11:26.4887063Z Absolute difference: 1 2025-12-04T11:11:26.4887186Z Relative difference: 1.0 2025-12-04T11:11:26.4887191Z 2025-12-04T11:11:26.4887398Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4888299Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4888304Z 2025-12-04T11:11:26.4888565Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4888779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4888906Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4889579Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4889813Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4889908Z graph_break [] 2025-12-04T11:11:26.4890025Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4890249Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4890966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4891062Z warnings.warn( 2025-12-04T11:11:26.4891780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4891879Z warnings.warn( 2025-12-04T11:11:26.4892102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4892212Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4892436Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4893123Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4893217Z graph_break [] 2025-12-04T11:11:26.4893335Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4893559Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4894272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4894384Z warnings.warn( 2025-12-04T11:11:26.4895087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4895183Z warnings.warn( 2025-12-04T11:11:26.4895404Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4895515Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4895736Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4896431Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4896599Z graph_break [] 2025-12-04T11:11:26.4896734Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4896944Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4897651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4897797Z warnings.warn( 2025-12-04T11:11:26.4898500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4898644Z warnings.warn( 2025-12-04T11:11:26.4899468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml - 2025-12-04T11:11:26.4899638Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4900573Z FAILED [0.5146s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4900580Z 2025-12-04T11:11:26.4900683Z Expected 1 but got 0. 2025-12-04T11:11:26.4900797Z Absolute difference: 1 2025-12-04T11:11:26.4901229Z Relative difference: 1.0 2025-12-04T11:11:26.4901235Z 2025-12-04T11:11:26.4901447Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4902349Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4902354Z 2025-12-04T11:11:26.4902616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4902806Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4903000Z =================== 1 failed, 7 deselected, 2 rerun in 4.93s =================== 2025-12-04T11:11:26.4903102Z Got exit code 1 2025-12-04T11:11:26.4903221Z Retrying single test... 2025-12-04T11:11:26.4903655Z W1204 11:05:55.446000 91945 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4904304Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml 2025-12-04T11:11:26.4904482Z ============================= test session starts ============================== 2025-12-04T11:11:26.4904825Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4904948Z cachedir: .pytest_cache 2025-12-04T11:11:26.4905457Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4905583Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4905703Z configfile: pytest.ini 2025-12-04T11:11:26.4906232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4906449Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4907424Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4907538Z Running 1 items in this shard 2025-12-04T11:11:26.4907543Z 2025-12-04T11:11:26.4908803Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:00.441921241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4908951Z 2025-12-04T11:11:26.4909465Z [W1204 11:06:16.056761902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4909470Z 2025-12-04T11:11:26.4910033Z [W1204 11:06:16.057022595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4910038Z 2025-12-04T11:11:26.4910540Z [W1204 11:06:16.064989326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4910621Z 2025-12-04T11:11:26.4911141Z [W1204 11:06:16.065843956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4911146Z 2025-12-04T11:11:26.4911645Z [W1204 11:06:16.066030596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4911649Z 2025-12-04T11:11:26.4912153Z [W1204 11:06:16.073356303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4912173Z 2025-12-04T11:11:26.4912672Z [W1204 11:06:16.074007942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4912678Z 2025-12-04T11:11:26.4913180Z [W1204 11:06:16.074201050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4913187Z 2025-12-04T11:11:26.4913657Z W1204 11:06:16.564000 91945 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4914159Z [W1204 11:06:16.260632875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4914163Z 2025-12-04T11:11:26.4914684Z [W1204 11:06:16.262356546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4914688Z 2025-12-04T11:11:26.4915187Z [W1204 11:06:16.262560770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4915193Z 2025-12-04T11:11:26.4915703Z [W1204 11:06:16.267025115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4915707Z 2025-12-04T11:11:26.4916208Z [W1204 11:06:16.267638604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4916213Z 2025-12-04T11:11:26.4916725Z [W1204 11:06:16.267828410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4916729Z 2025-12-04T11:11:26.4917230Z [W1204 11:06:16.274289247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4917235Z 2025-12-04T11:11:26.4917736Z [W1204 11:06:16.274902933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4917744Z 2025-12-04T11:11:26.4918257Z [W1204 11:06:16.275089492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4918263Z 2025-12-04T11:11:26.4918395Z ('RERUN', {'yellow': True}) [19.5066s] [100%] 2025-12-04T11:11:26.4919660Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:17.715977763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4919666Z 2025-12-04T11:11:26.4920231Z [W1204 11:06:17.716671350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4920237Z 2025-12-04T11:11:26.4920752Z [W1204 11:06:17.716873571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4920787Z 2025-12-04T11:11:26.4921285Z [W1204 11:06:17.721635021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4921289Z 2025-12-04T11:11:26.4921864Z [W1204 11:06:17.722242987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4921907Z 2025-12-04T11:11:26.4922406Z [W1204 11:06:17.722427648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4922410Z 2025-12-04T11:11:26.4922915Z [W1204 11:06:17.728965216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4922937Z 2025-12-04T11:11:26.4923434Z [W1204 11:06:17.729573685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4923441Z 2025-12-04T11:11:26.4923940Z [W1204 11:06:17.729757201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4923945Z 2025-12-04T11:11:26.4924456Z [W1204 11:06:17.832629080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4924463Z 2025-12-04T11:11:26.4924956Z [W1204 11:06:17.833434828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4924961Z 2025-12-04T11:11:26.4925471Z [W1204 11:06:17.833650672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4925480Z 2025-12-04T11:11:26.4925979Z [W1204 11:06:17.838238860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4925983Z 2025-12-04T11:11:26.4926502Z [W1204 11:06:17.838921872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4926506Z 2025-12-04T11:11:26.4927007Z [W1204 11:06:17.839116695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4927014Z 2025-12-04T11:11:26.4927524Z [W1204 11:06:17.845755241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4927529Z 2025-12-04T11:11:26.4928026Z [W1204 11:06:17.846409992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4928030Z 2025-12-04T11:11:26.4928527Z [W1204 11:06:17.846600365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4928546Z 2025-12-04T11:11:26.4928674Z ('RERUN', {'yellow': True}) [0.5327s] [100%] 2025-12-04T11:11:26.4929925Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:17.224991281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4929932Z 2025-12-04T11:11:26.4930440Z [W1204 11:06:17.225740282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4930445Z 2025-12-04T11:11:26.4930941Z [W1204 11:06:17.225937479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4931014Z 2025-12-04T11:11:26.4931523Z [W1204 11:06:17.230701523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4931527Z 2025-12-04T11:11:26.4932054Z [W1204 11:06:17.231320251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4932059Z 2025-12-04T11:11:26.4932570Z [W1204 11:06:17.231507341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4932605Z 2025-12-04T11:11:26.4933102Z [W1204 11:06:17.237938082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4933106Z 2025-12-04T11:11:26.4933603Z [W1204 11:06:17.238548021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4933620Z 2025-12-04T11:11:26.4934121Z [W1204 11:06:17.238731677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4934126Z 2025-12-04T11:11:26.4934623Z [W1204 11:06:17.340810390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4934629Z 2025-12-04T11:11:26.4935138Z [W1204 11:06:17.341543255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4935145Z 2025-12-04T11:11:26.4935744Z [W1204 11:06:17.341744221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4935749Z 2025-12-04T11:11:26.4936260Z [W1204 11:06:17.346425265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4936265Z 2025-12-04T11:11:26.4936768Z [W1204 11:06:17.347075578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4936773Z 2025-12-04T11:11:26.4937282Z [W1204 11:06:17.347270749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4937289Z 2025-12-04T11:11:26.4937785Z [W1204 11:06:17.355793590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4937792Z 2025-12-04T11:11:26.4938299Z [W1204 11:06:17.356944014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4938304Z 2025-12-04T11:11:26.4938807Z [W1204 11:06:17.357151327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4938811Z 2025-12-04T11:11:26.4938911Z FAILED [0.5113s] [100%] 2025-12-04T11:11:26.4938915Z 2025-12-04T11:11:26.4939070Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.4939568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4939704Z Traceback (most recent call last): 2025-12-04T11:11:26.4940204Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4940431Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4940903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4941065Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4941604Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4941873Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4942005Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4942010Z 2025-12-04T11:11:26.4942124Z Expected 1 but got 0. 2025-12-04T11:11:26.4942227Z Absolute difference: 1 2025-12-04T11:11:26.4942367Z Relative difference: 1.0 2025-12-04T11:11:26.4942372Z 2025-12-04T11:11:26.4942590Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4943474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4943510Z 2025-12-04T11:11:26.4943785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4944000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4944113Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4944805Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4945028Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4945138Z graph_break [] 2025-12-04T11:11:26.4945254Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4945465Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4946663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4946780Z if out == self.unknown_value: 2025-12-04T11:11:26.4947510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4947613Z warnings.warn( 2025-12-04T11:11:26.4948323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4948437Z warnings.warn( 2025-12-04T11:11:26.4948935Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4949055Z Traceback (most recent call last): 2025-12-04T11:11:26.4949566Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4949795Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4950255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4950414Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4950941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4951155Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4951282Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4951289Z 2025-12-04T11:11:26.4951404Z Expected 1 but got 0. 2025-12-04T11:11:26.4951507Z Absolute difference: 1 2025-12-04T11:11:26.4951613Z Relative difference: 1.0 2025-12-04T11:11:26.4951617Z 2025-12-04T11:11:26.4951837Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4952720Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4952724Z 2025-12-04T11:11:26.4952988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4953277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4953391Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4954085Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4954359Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4954454Z graph_break [] 2025-12-04T11:11:26.4954590Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4954805Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4956030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4956144Z if out == self.unknown_value: 2025-12-04T11:11:26.4956866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4956979Z warnings.warn( 2025-12-04T11:11:26.4957689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4957805Z warnings.warn( 2025-12-04T11:11:26.4958022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4958135Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4958376Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4959081Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4959212Z graph_break [] 2025-12-04T11:11:26.4959385Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4959703Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4960513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4960615Z warnings.warn( 2025-12-04T11:11:26.4961407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4961586Z warnings.warn( 2025-12-04T11:11:26.4961732Z =================================== FAILURES =================================== 2025-12-04T11:11:26.4962225Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.4962360Z Traceback (most recent call last): 2025-12-04T11:11:26.4962863Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.4963103Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.4963549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.4963711Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.4964247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.4964448Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.4964577Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4964595Z 2025-12-04T11:11:26.4964699Z Expected 1 but got 0. 2025-12-04T11:11:26.4964804Z Absolute difference: 1 2025-12-04T11:11:26.4964926Z Relative difference: 1.0 2025-12-04T11:11:26.4964930Z 2025-12-04T11:11:26.4965141Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4966213Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4966232Z 2025-12-04T11:11:26.4966528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4966742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4966871Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4967547Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4967804Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4967913Z graph_break [] 2025-12-04T11:11:26.4968031Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4968244Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4969444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.4969559Z if out == self.unknown_value: 2025-12-04T11:11:26.4970284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4970386Z warnings.warn( 2025-12-04T11:11:26.4971097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4971215Z warnings.warn( 2025-12-04T11:11:26.4971429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4971555Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4971780Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4972461Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4972571Z graph_break [] 2025-12-04T11:11:26.4972686Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4972896Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4973619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4973720Z warnings.warn( 2025-12-04T11:11:26.4974437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4974534Z warnings.warn( 2025-12-04T11:11:26.4974746Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.4974869Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.4975094Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.4975780Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.4975873Z graph_break [] 2025-12-04T11:11:26.4975991Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.4976215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.4976925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4977032Z warnings.warn( 2025-12-04T11:11:26.4977816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.4977915Z warnings.warn( 2025-12-04T11:11:26.4978750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml - 2025-12-04T11:11:26.4978959Z =========================== short test summary info ============================ 2025-12-04T11:11:26.4979882Z FAILED [0.5113s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.4979936Z 2025-12-04T11:11:26.4980041Z Expected 1 but got 0. 2025-12-04T11:11:26.4980146Z Absolute difference: 1 2025-12-04T11:11:26.4980270Z Relative difference: 1.0 2025-12-04T11:11:26.4980275Z 2025-12-04T11:11:26.4980489Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.4981375Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4981381Z 2025-12-04T11:11:26.4981663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.4981843Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.4982052Z ================== 1 failed, 10 deselected, 2 rerun in 20.58s ================== 2025-12-04T11:11:26.4982153Z Got exit code 1 2025-12-04T11:11:26.4982259Z Retrying single test... 2025-12-04T11:11:26.4982715Z W1204 11:06:28.890000 92127 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4983413Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml 2025-12-04T11:11:26.4983598Z ============================= test session starts ============================== 2025-12-04T11:11:26.4983944Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.4984053Z cachedir: .pytest_cache 2025-12-04T11:11:26.4984578Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.4984700Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.4984807Z configfile: pytest.ini 2025-12-04T11:11:26.4985356Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.4985576Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.4986558Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.4986672Z Running 1 items in this shard 2025-12-04T11:11:26.4986677Z 2025-12-04T11:11:26.4987942Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:34.961743955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4987964Z 2025-12-04T11:11:26.4988470Z [W1204 11:06:49.983476923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4988478Z 2025-12-04T11:11:26.4988980Z [W1204 11:06:49.983738171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4988985Z 2025-12-04T11:11:26.4989588Z [W1204 11:06:49.991654007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4989593Z 2025-12-04T11:11:26.4990092Z [W1204 11:06:49.992527289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4990130Z 2025-12-04T11:11:26.4990641Z [W1204 11:06:49.992715880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4990646Z 2025-12-04T11:11:26.4991145Z [W1204 11:06:49.000186837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4991178Z 2025-12-04T11:11:26.4991688Z [W1204 11:06:49.000920630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4991693Z 2025-12-04T11:11:26.4992198Z [W1204 11:06:49.001107807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4992207Z 2025-12-04T11:11:26.4992666Z W1204 11:06:49.492000 92127 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.4993169Z [W1204 11:06:49.190514124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4993176Z 2025-12-04T11:11:26.4993680Z [W1204 11:06:49.192252014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4993687Z 2025-12-04T11:11:26.4994202Z [W1204 11:06:49.192466941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4994206Z 2025-12-04T11:11:26.4994708Z [W1204 11:06:49.197113156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4994713Z 2025-12-04T11:11:26.4995228Z [W1204 11:06:49.197786576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4995233Z 2025-12-04T11:11:26.4995731Z [W1204 11:06:49.197980424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4995738Z 2025-12-04T11:11:26.4996247Z [W1204 11:06:49.204602007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4996253Z 2025-12-04T11:11:26.4996749Z [W1204 11:06:49.205263362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4996753Z 2025-12-04T11:11:26.4997266Z [W1204 11:06:49.205453967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4997270Z 2025-12-04T11:11:26.4997399Z ('RERUN', {'yellow': True}) [18.9565s] [100%] 2025-12-04T11:11:26.4998641Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:50.655051408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4998661Z 2025-12-04T11:11:26.4999161Z [W1204 11:06:50.655794263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4999168Z 2025-12-04T11:11:26.4999664Z [W1204 11:06:50.655991688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.4999669Z 2025-12-04T11:11:26.5000177Z [W1204 11:06:50.660801611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5000182Z 2025-12-04T11:11:26.5000783Z [W1204 11:06:50.661450424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5000788Z 2025-12-04T11:11:26.5001581Z [W1204 11:06:50.661640073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5001675Z 2025-12-04T11:11:26.5002177Z [W1204 11:06:50.668187189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5002181Z 2025-12-04T11:11:26.5002736Z [W1204 11:06:50.668807784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5002741Z 2025-12-04T11:11:26.5003235Z [W1204 11:06:50.668994484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5003239Z 2025-12-04T11:11:26.5003758Z [W1204 11:06:50.774696537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5003764Z 2025-12-04T11:11:26.5004264Z [W1204 11:06:50.775467245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5004271Z 2025-12-04T11:11:26.5004767Z [W1204 11:06:50.775674185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5004790Z 2025-12-04T11:11:26.5005286Z [W1204 11:06:50.780331145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5005292Z 2025-12-04T11:11:26.5005787Z [W1204 11:06:50.781003320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5005791Z 2025-12-04T11:11:26.5006306Z [W1204 11:06:50.781198546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5006311Z 2025-12-04T11:11:26.5006806Z [W1204 11:06:50.787816984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5006813Z 2025-12-04T11:11:26.5007322Z [W1204 11:06:50.788502090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5007327Z 2025-12-04T11:11:26.5007824Z [W1204 11:06:50.788695536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5007831Z 2025-12-04T11:11:26.5007974Z ('RERUN', {'yellow': True}) [0.5439s] [100%] 2025-12-04T11:11:26.5009230Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:50.174239687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5009237Z 2025-12-04T11:11:26.5009736Z [W1204 11:06:50.174994514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5009762Z 2025-12-04T11:11:26.5010263Z [W1204 11:06:50.175194290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5010267Z 2025-12-04T11:11:26.5010766Z [W1204 11:06:50.180017002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5010772Z 2025-12-04T11:11:26.5011288Z [W1204 11:06:50.180660924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5011293Z 2025-12-04T11:11:26.5011793Z [W1204 11:06:50.180848909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5011905Z 2025-12-04T11:11:26.5012416Z [W1204 11:06:50.187406964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5012421Z 2025-12-04T11:11:26.5012950Z [W1204 11:06:50.188036148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5012955Z 2025-12-04T11:11:26.5013469Z [W1204 11:06:50.188220274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5013509Z 2025-12-04T11:11:26.5014010Z [W1204 11:06:50.294881054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5014014Z 2025-12-04T11:11:26.5014529Z [W1204 11:06:50.296109568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5014533Z 2025-12-04T11:11:26.5015032Z [W1204 11:06:50.296321565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5015037Z 2025-12-04T11:11:26.5015534Z [W1204 11:06:50.301708418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5015555Z 2025-12-04T11:11:26.5016054Z [W1204 11:06:50.302778422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5016061Z 2025-12-04T11:11:26.5016556Z [W1204 11:06:50.302978897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5016561Z 2025-12-04T11:11:26.5017071Z [W1204 11:06:50.310077266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5017075Z 2025-12-04T11:11:26.5017575Z [W1204 11:06:50.310792521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5017579Z 2025-12-04T11:11:26.5018093Z [W1204 11:06:50.310985693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5018099Z 2025-12-04T11:11:26.5018198Z FAILED [0.5210s] [100%] 2025-12-04T11:11:26.5018203Z 2025-12-04T11:11:26.5018356Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5018853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5018973Z Traceback (most recent call last): 2025-12-04T11:11:26.5019486Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5019714Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5020171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5020346Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5020876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5021090Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5021218Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5021226Z 2025-12-04T11:11:26.5021329Z Expected 1 but got 0. 2025-12-04T11:11:26.5021447Z Absolute difference: 1 2025-12-04T11:11:26.5021553Z Relative difference: 1.0 2025-12-04T11:11:26.5021558Z 2025-12-04T11:11:26.5021781Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5022754Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5022760Z 2025-12-04T11:11:26.5023025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5023282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5023395Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5024090Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5024341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5024438Z graph_break [] 2025-12-04T11:11:26.5024573Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5024785Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5025972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5026101Z if out == self.unknown_value: 2025-12-04T11:11:26.5026816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5026927Z warnings.warn( 2025-12-04T11:11:26.5027632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5027733Z warnings.warn( 2025-12-04T11:11:26.5028240Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5028361Z Traceback (most recent call last): 2025-12-04T11:11:26.5028875Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5029100Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5029546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5029722Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5030247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5030449Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5030590Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5030595Z 2025-12-04T11:11:26.5030697Z Expected 1 but got 0. 2025-12-04T11:11:26.5030814Z Absolute difference: 1 2025-12-04T11:11:26.5030919Z Relative difference: 1.0 2025-12-04T11:11:26.5030923Z 2025-12-04T11:11:26.5031137Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5032048Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5032056Z 2025-12-04T11:11:26.5032317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5032540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5032654Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5033337Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5033570Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5033665Z graph_break [] 2025-12-04T11:11:26.5033786Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5034084Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5035265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5035419Z if out == self.unknown_value: 2025-12-04T11:11:26.5036130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5036259Z warnings.warn( 2025-12-04T11:11:26.5036975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5037071Z warnings.warn( 2025-12-04T11:11:26.5037298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5037410Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5037639Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5038330Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5038427Z graph_break [] 2025-12-04T11:11:26.5038548Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5038771Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5039479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5039588Z warnings.warn( 2025-12-04T11:11:26.5040292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5040387Z warnings.warn( 2025-12-04T11:11:26.5040546Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5041037Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5041174Z Traceback (most recent call last): 2025-12-04T11:11:26.5041741Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5041973Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5042438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5042598Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5043119Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5043340Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5043470Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5043475Z 2025-12-04T11:11:26.5043592Z Expected 1 but got 0. 2025-12-04T11:11:26.5043695Z Absolute difference: 1 2025-12-04T11:11:26.5043804Z Relative difference: 1.0 2025-12-04T11:11:26.5043808Z 2025-12-04T11:11:26.5044030Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5044911Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5044918Z 2025-12-04T11:11:26.5045195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5045408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5045520Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5046303Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5046525Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5046652Z graph_break [] 2025-12-04T11:11:26.5046780Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5046988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5048182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5048328Z if out == self.unknown_value: 2025-12-04T11:11:26.5049040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5049154Z warnings.warn( 2025-12-04T11:11:26.5049864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5049976Z warnings.warn( 2025-12-04T11:11:26.5050190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5050301Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5050539Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5051216Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5051312Z graph_break [] 2025-12-04T11:11:26.5051444Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5051656Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5052381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5052478Z warnings.warn( 2025-12-04T11:11:26.5053183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5053297Z warnings.warn( 2025-12-04T11:11:26.5053508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5053623Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5053860Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5054536Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5054646Z graph_break [] 2025-12-04T11:11:26.5054766Z aten_mm_info [('aten.mm_32_72_1024', 2)] 2025-12-04T11:11:26.5054981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5055705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5055806Z warnings.warn( 2025-12-04T11:11:26.5056522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5056620Z warnings.warn( 2025-12-04T11:11:26.5057444Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml - 2025-12-04T11:11:26.5057628Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5058612Z FAILED [0.5210s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5058618Z 2025-12-04T11:11:26.5058735Z Expected 1 but got 0. 2025-12-04T11:11:26.5058842Z Absolute difference: 1 2025-12-04T11:11:26.5058979Z Relative difference: 1.0 2025-12-04T11:11:26.5058984Z 2025-12-04T11:11:26.5059209Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5060097Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5060132Z 2025-12-04T11:11:26.5060407Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5060582Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5060777Z ================== 1 failed, 10 deselected, 2 rerun in 20.05s ================== 2025-12-04T11:11:26.5060895Z Got exit code 1 2025-12-04T11:11:26.5061696Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5062100Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.5062547Z W1204 11:07:01.958000 92309 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5063193Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml 2025-12-04T11:11:26.5063371Z ============================= test session starts ============================== 2025-12-04T11:11:26.5063711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5063819Z cachedir: .pytest_cache 2025-12-04T11:11:26.5064391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5064514Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5064630Z configfile: pytest.ini 2025-12-04T11:11:26.5065158Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5065368Z collecting ... collected 58 items / 8 deselected / 50 selected 2025-12-04T11:11:26.5065519Z stepcurrent: skipping 8 already run items. 2025-12-04T11:11:26.5065628Z Running 3 items in this shard 2025-12-04T11:11:26.5065633Z 2025-12-04T11:11:26.5066481Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7630s] [ 33%] 2025-12-04T11:11:26.5067330Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4101s] [ 33%] 2025-12-04T11:11:26.5068090Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4080s] [ 33%] 2025-12-04T11:11:26.5068098Z 2025-12-04T11:11:26.5068250Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5068749Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5068884Z Traceback (most recent call last): 2025-12-04T11:11:26.5069385Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5069611Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5070155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5070321Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5070865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5071101Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5071233Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5071238Z 2025-12-04T11:11:26.5071355Z Expected 1 but got 2. 2025-12-04T11:11:26.5071489Z Absolute difference: 1 2025-12-04T11:11:26.5071603Z Relative difference: 1.0 2025-12-04T11:11:26.5071608Z 2025-12-04T11:11:26.5071833Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5072714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5072719Z 2025-12-04T11:11:26.5073002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5073220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5073336Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5073868Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5074090Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5074202Z graph_break [] 2025-12-04T11:11:26.5074415Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5075134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5075247Z warnings.warn( 2025-12-04T11:11:26.5075960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5076056Z warnings.warn( 2025-12-04T11:11:26.5076561Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5076684Z Traceback (most recent call last): 2025-12-04T11:11:26.5077195Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5077421Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5077873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5078048Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5078573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5078790Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5078918Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5078923Z 2025-12-04T11:11:26.5079024Z Expected 1 but got 2. 2025-12-04T11:11:26.5079141Z Absolute difference: 1 2025-12-04T11:11:26.5079246Z Relative difference: 1.0 2025-12-04T11:11:26.5079251Z 2025-12-04T11:11:26.5079459Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5080357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5080364Z 2025-12-04T11:11:26.5080628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5080856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5080970Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5081614Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5081857Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5081985Z graph_break [] 2025-12-04T11:11:26.5082213Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5082936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5083086Z warnings.warn( 2025-12-04T11:11:26.5083810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5083909Z warnings.warn( 2025-12-04T11:11:26.5084121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5084249Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5084472Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5084999Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5085095Z graph_break [] 2025-12-04T11:11:26.5085305Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5086029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5086128Z warnings.warn( 2025-12-04T11:11:26.5086847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5086945Z warnings.warn( 2025-12-04T11:11:26.5087090Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5087594Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5087713Z Traceback (most recent call last): 2025-12-04T11:11:26.5088212Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5088451Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5088899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5089074Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5089596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5089796Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5089942Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5089947Z 2025-12-04T11:11:26.5090050Z Expected 1 but got 2. 2025-12-04T11:11:26.5090153Z Absolute difference: 1 2025-12-04T11:11:26.5090272Z Relative difference: 1.0 2025-12-04T11:11:26.5090279Z 2025-12-04T11:11:26.5090487Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5091384Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5091391Z 2025-12-04T11:11:26.5091653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5091864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5091989Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5092566Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5092802Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5092896Z graph_break [] 2025-12-04T11:11:26.5093110Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5093868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5093996Z warnings.warn( 2025-12-04T11:11:26.5094712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5094807Z warnings.warn( 2025-12-04T11:11:26.5095016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5095143Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5095368Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5095880Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5095989Z graph_break [] 2025-12-04T11:11:26.5096197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5096919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5097017Z warnings.warn( 2025-12-04T11:11:26.5097716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5097823Z warnings.warn( 2025-12-04T11:11:26.5098033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5098141Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5098380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5098891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5098999Z graph_break [] 2025-12-04T11:11:26.5099210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5099916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5100028Z warnings.warn( 2025-12-04T11:11:26.5100731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5101031Z warnings.warn( 2025-12-04T11:11:26.5101861Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml - 2025-12-04T11:11:26.5102030Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5102961Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5102967Z 2025-12-04T11:11:26.5103069Z Expected 1 but got 2. 2025-12-04T11:11:26.5103189Z Absolute difference: 1 2025-12-04T11:11:26.5103294Z Relative difference: 1.0 2025-12-04T11:11:26.5103299Z 2025-12-04T11:11:26.5103510Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5104407Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5104538Z 2025-12-04T11:11:26.5104805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5105003Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5105240Z =================== 1 failed, 8 deselected, 2 rerun in 4.61s =================== 2025-12-04T11:11:26.5105337Z Got exit code 1 2025-12-04T11:11:26.5105458Z Retrying single test... 2025-12-04T11:11:26.5105893Z W1204 11:07:21.929000 92478 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5106583Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml 2025-12-04T11:11:26.5106756Z ============================= test session starts ============================== 2025-12-04T11:11:26.5107096Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5107218Z cachedir: .pytest_cache 2025-12-04T11:11:26.5107724Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5107846Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5107966Z configfile: pytest.ini 2025-12-04T11:11:26.5108493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5108707Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5109676Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5109801Z Running 1 items in this shard 2025-12-04T11:11:26.5109806Z 2025-12-04T11:11:26.5111071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:25.096109213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5111079Z 2025-12-04T11:11:26.5111589Z [W1204 11:07:41.895403086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5111594Z 2025-12-04T11:11:26.5112109Z [W1204 11:07:41.895684721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5112117Z 2025-12-04T11:11:26.5112614Z [W1204 11:07:41.903302263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5112620Z 2025-12-04T11:11:26.5113133Z [W1204 11:07:41.904103517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5113141Z 2025-12-04T11:11:26.5113644Z [W1204 11:07:41.904299140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5113649Z 2025-12-04T11:11:26.5114150Z [W1204 11:07:41.911304172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5114169Z 2025-12-04T11:11:26.5114666Z [W1204 11:07:41.912046871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5114673Z 2025-12-04T11:11:26.5115172Z [W1204 11:07:41.912236410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5115177Z 2025-12-04T11:11:26.5115690Z [W1204 11:07:43.860969451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5115694Z 2025-12-04T11:11:26.5116256Z [W1204 11:07:43.862717857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5116261Z 2025-12-04T11:11:26.5116770Z [W1204 11:07:43.862921406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5116803Z 2025-12-04T11:11:26.5117304Z [W1204 11:07:43.866832964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5117337Z 2025-12-04T11:11:26.5117849Z [W1204 11:07:43.867494541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5117854Z 2025-12-04T11:11:26.5118351Z [W1204 11:07:43.867687451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5118356Z 2025-12-04T11:11:26.5118873Z [W1204 11:07:43.873683041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5118878Z 2025-12-04T11:11:26.5119378Z [W1204 11:07:43.874374385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5119385Z 2025-12-04T11:11:26.5119882Z [W1204 11:07:43.874564874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5119907Z 2025-12-04T11:11:26.5120039Z ('RERUN', {'yellow': True}) [19.5915s] [100%] 2025-12-04T11:11:26.5121287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:43.236709142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5121293Z 2025-12-04T11:11:26.5121871Z [W1204 11:07:43.237511092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5121877Z 2025-12-04T11:11:26.5122377Z [W1204 11:07:43.237718651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5122384Z 2025-12-04T11:11:26.5122898Z [W1204 11:07:43.241690374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5122905Z 2025-12-04T11:11:26.5123404Z [W1204 11:07:43.242532181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5123408Z 2025-12-04T11:11:26.5123921Z [W1204 11:07:43.242722598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5123926Z 2025-12-04T11:11:26.5124426Z [W1204 11:07:43.248589134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5124431Z 2025-12-04T11:11:26.5124928Z [W1204 11:07:43.249220746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5124949Z 2025-12-04T11:11:26.5125450Z [W1204 11:07:43.249405647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5125457Z 2025-12-04T11:11:26.5125955Z [W1204 11:07:43.336986710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5125960Z 2025-12-04T11:11:26.5126476Z [W1204 11:07:43.337784091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5126481Z 2025-12-04T11:11:26.5127040Z [W1204 11:07:43.337991768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5127045Z 2025-12-04T11:11:26.5127562Z [W1204 11:07:43.341962348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5127610Z 2025-12-04T11:11:26.5128107Z [W1204 11:07:43.342669075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5128111Z 2025-12-04T11:11:26.5128676Z [W1204 11:07:43.342863476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5128680Z 2025-12-04T11:11:26.5129176Z [W1204 11:07:43.348870113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5129180Z 2025-12-04T11:11:26.5129694Z [W1204 11:07:43.349755938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5129698Z 2025-12-04T11:11:26.5130197Z [W1204 11:07:43.349949727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5130204Z 2025-12-04T11:11:26.5130331Z ('RERUN', {'yellow': True}) [0.4368s] [100%] 2025-12-04T11:11:26.5131584Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:44.648017488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5131592Z 2025-12-04T11:11:26.5132093Z [W1204 11:07:44.648801020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5132097Z 2025-12-04T11:11:26.5132619Z [W1204 11:07:44.649002947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5132624Z 2025-12-04T11:11:26.5133124Z [W1204 11:07:44.652929666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5133132Z 2025-12-04T11:11:26.5133645Z [W1204 11:07:44.653767752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5133650Z 2025-12-04T11:11:26.5134143Z [W1204 11:07:44.653959206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5134150Z 2025-12-04T11:11:26.5134660Z [W1204 11:07:44.659862182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5134665Z 2025-12-04T11:11:26.5135163Z [W1204 11:07:44.660531722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5135168Z 2025-12-04T11:11:26.5135674Z [W1204 11:07:44.660724177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5135681Z 2025-12-04T11:11:26.5136174Z [W1204 11:07:44.747307708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5136178Z 2025-12-04T11:11:26.5136675Z [W1204 11:07:44.748074996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5136681Z 2025-12-04T11:11:26.5137192Z [W1204 11:07:44.748278663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5137197Z 2025-12-04T11:11:26.5137694Z [W1204 11:07:44.752134291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5137760Z 2025-12-04T11:11:26.5138273Z [W1204 11:07:44.752773003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5138277Z 2025-12-04T11:11:26.5138802Z [W1204 11:07:44.752965699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5138806Z 2025-12-04T11:11:26.5139311Z [W1204 11:07:44.758801988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5139346Z 2025-12-04T11:11:26.5139841Z [W1204 11:07:44.759589392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5139846Z 2025-12-04T11:11:26.5140350Z [W1204 11:07:44.759778107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5140355Z 2025-12-04T11:11:26.5140459Z FAILED [0.4072s] [100%] 2025-12-04T11:11:26.5140464Z 2025-12-04T11:11:26.5140604Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5141109Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5141230Z Traceback (most recent call last): 2025-12-04T11:11:26.5141738Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5141967Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5142419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5142592Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5143117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5143319Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5143462Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5143467Z 2025-12-04T11:11:26.5143571Z Expected 1 but got 2. 2025-12-04T11:11:26.5143692Z Absolute difference: 1 2025-12-04T11:11:26.5143797Z Relative difference: 1.0 2025-12-04T11:11:26.5143801Z 2025-12-04T11:11:26.5144009Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5144904Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5144911Z 2025-12-04T11:11:26.5145172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5145400Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5145512Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5146073Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5146305Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5146402Z graph_break [] 2025-12-04T11:11:26.5146611Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5147799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5147914Z if out == self.unknown_value: 2025-12-04T11:11:26.5148636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5148735Z warnings.warn( 2025-12-04T11:11:26.5149591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5149704Z warnings.warn( 2025-12-04T11:11:26.5150230Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5150365Z Traceback (most recent call last): 2025-12-04T11:11:26.5151247Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5151504Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5151966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5152125Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5152665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5152866Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5152994Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5153001Z 2025-12-04T11:11:26.5153116Z Expected 1 but got 2. 2025-12-04T11:11:26.5153220Z Absolute difference: 1 2025-12-04T11:11:26.5153326Z Relative difference: 1.0 2025-12-04T11:11:26.5153344Z 2025-12-04T11:11:26.5153553Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5154432Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5154437Z 2025-12-04T11:11:26.5154712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5154925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5155041Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5155570Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5155792Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5155904Z graph_break [] 2025-12-04T11:11:26.5156118Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5157290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5157417Z if out == self.unknown_value: 2025-12-04T11:11:26.5158131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5158240Z warnings.warn( 2025-12-04T11:11:26.5158946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5159045Z warnings.warn( 2025-12-04T11:11:26.5159271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5159384Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5159609Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5160137Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5160231Z graph_break [] 2025-12-04T11:11:26.5160454Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5161219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5161317Z warnings.warn( 2025-12-04T11:11:26.5162105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5162246Z warnings.warn( 2025-12-04T11:11:26.5162404Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5162898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5163047Z Traceback (most recent call last): 2025-12-04T11:11:26.5163556Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5163782Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5164232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5164411Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5164936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5165151Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5165280Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5165285Z 2025-12-04T11:11:26.5165387Z Expected 1 but got 2. 2025-12-04T11:11:26.5165504Z Absolute difference: 1 2025-12-04T11:11:26.5165610Z Relative difference: 1.0 2025-12-04T11:11:26.5165615Z 2025-12-04T11:11:26.5165822Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5166712Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5166721Z 2025-12-04T11:11:26.5166982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5167207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5167322Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5167833Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5168066Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5168163Z graph_break [] 2025-12-04T11:11:26.5168385Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5169561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5169677Z if out == self.unknown_value: 2025-12-04T11:11:26.5170398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5170498Z warnings.warn( 2025-12-04T11:11:26.5171215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5171311Z warnings.warn( 2025-12-04T11:11:26.5171524Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5171651Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5171875Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5172409Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5172509Z graph_break [] 2025-12-04T11:11:26.5172802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5173538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5173677Z warnings.warn( 2025-12-04T11:11:26.5174388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5174526Z warnings.warn( 2025-12-04T11:11:26.5174741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5174868Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5175093Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5175618Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5175725Z graph_break [] 2025-12-04T11:11:26.5175937Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5176645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5176757Z warnings.warn( 2025-12-04T11:11:26.5177464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5177681Z warnings.warn( 2025-12-04T11:11:26.5178544Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml - 2025-12-04T11:11:26.5178756Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5179949Z FAILED [0.4072s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5179955Z 2025-12-04T11:11:26.5180186Z Expected 1 but got 2. 2025-12-04T11:11:26.5188109Z Absolute difference: 1 2025-12-04T11:11:26.5188284Z Relative difference: 1.0 2025-12-04T11:11:26.5188291Z 2025-12-04T11:11:26.5188541Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5189456Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5189462Z 2025-12-04T11:11:26.5189731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5189926Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5190130Z ================== 1 failed, 10 deselected, 2 rerun in 20.47s ================== 2025-12-04T11:11:26.5190243Z Got exit code 1 2025-12-04T11:11:26.5190348Z Retrying single test... 2025-12-04T11:11:26.5190790Z W1204 11:07:55.556000 92652 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5191454Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml 2025-12-04T11:11:26.5191619Z ============================= test session starts ============================== 2025-12-04T11:11:26.5191980Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5192088Z cachedir: .pytest_cache 2025-12-04T11:11:26.5192599Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5192734Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5192991Z configfile: pytest.ini 2025-12-04T11:11:26.5193525Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5193768Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5194749Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5194901Z Running 1 items in this shard 2025-12-04T11:11:26.5194906Z 2025-12-04T11:11:26.5196156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:59.692614152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5196162Z 2025-12-04T11:11:26.5196674Z [W1204 11:08:14.944741026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5196679Z 2025-12-04T11:11:26.5197193Z [W1204 11:08:14.944997489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5197201Z 2025-12-04T11:11:26.5197700Z [W1204 11:08:14.952347341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5197707Z 2025-12-04T11:11:26.5198222Z [W1204 11:08:14.953092592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5198226Z 2025-12-04T11:11:26.5198726Z [W1204 11:08:14.953283246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5198731Z 2025-12-04T11:11:26.5199235Z [W1204 11:08:14.960194788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5199255Z 2025-12-04T11:11:26.5199756Z [W1204 11:08:14.960887744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5199763Z 2025-12-04T11:11:26.5200258Z [W1204 11:08:14.961078458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5200265Z 2025-12-04T11:11:26.5200778Z [W1204 11:08:16.906920855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5200782Z 2025-12-04T11:11:26.5201549Z [W1204 11:08:16.908640077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5201555Z 2025-12-04T11:11:26.5202077Z [W1204 11:08:16.908843055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5202081Z 2025-12-04T11:11:26.5202578Z [W1204 11:08:16.912724829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5202586Z 2025-12-04T11:11:26.5203097Z [W1204 11:08:16.913376527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5203102Z 2025-12-04T11:11:26.5203598Z [W1204 11:08:16.913568523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5203603Z 2025-12-04T11:11:26.5204113Z [W1204 11:08:16.919508045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5204118Z 2025-12-04T11:11:26.5204745Z [W1204 11:08:16.920153636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5204751Z 2025-12-04T11:11:26.5205253Z [W1204 11:08:16.920348056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5205301Z 2025-12-04T11:11:26.5205443Z ('RERUN', {'yellow': True}) [19.0097s] [100%] 2025-12-04T11:11:26.5206688Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:08:16.281777427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5206736Z 2025-12-04T11:11:26.5207250Z [W1204 11:08:16.282559976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5207254Z 2025-12-04T11:11:26.5207755Z [W1204 11:08:16.282756834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5207760Z 2025-12-04T11:11:26.5208271Z [W1204 11:08:16.286660617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5208278Z 2025-12-04T11:11:26.5208779Z [W1204 11:08:16.287461036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5208783Z 2025-12-04T11:11:26.5209297Z [W1204 11:08:16.287650728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5209302Z 2025-12-04T11:11:26.5209798Z [W1204 11:08:16.293628739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5209803Z 2025-12-04T11:11:26.5210304Z [W1204 11:08:16.294267684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5210322Z 2025-12-04T11:11:26.5210822Z [W1204 11:08:16.294451033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5210829Z 2025-12-04T11:11:26.5211322Z [W1204 11:08:16.381661027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5211327Z 2025-12-04T11:11:26.5211839Z [W1204 11:08:16.382436194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5211847Z 2025-12-04T11:11:26.5212344Z [W1204 11:08:16.382632566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5212349Z 2025-12-04T11:11:26.5212861Z [W1204 11:08:16.386514550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5212866Z 2025-12-04T11:11:26.5213365Z [W1204 11:08:16.387176268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5213372Z 2025-12-04T11:11:26.5213886Z [W1204 11:08:16.387367770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5213891Z 2025-12-04T11:11:26.5214388Z [W1204 11:08:16.393429266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5214394Z 2025-12-04T11:11:26.5214901Z [W1204 11:08:16.394299259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5214905Z 2025-12-04T11:11:26.5215400Z [W1204 11:08:16.394491325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5215463Z 2025-12-04T11:11:26.5215592Z ('RERUN', {'yellow': True}) [0.4359s] [100%] 2025-12-04T11:11:26.5216840Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:08:17.694085868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5216875Z 2025-12-04T11:11:26.5217376Z [W1204 11:08:17.694868045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5217409Z 2025-12-04T11:11:26.5217919Z [W1204 11:08:17.695063346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5217924Z 2025-12-04T11:11:26.5218422Z [W1204 11:08:17.698966477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5218431Z 2025-12-04T11:11:26.5218939Z [W1204 11:08:17.699767145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5218943Z 2025-12-04T11:11:26.5219441Z [W1204 11:08:17.699952906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5219445Z 2025-12-04T11:11:26.5219955Z [W1204 11:08:17.705935317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5219962Z 2025-12-04T11:11:26.5220457Z [W1204 11:08:17.706605906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5220462Z 2025-12-04T11:11:26.5220959Z [W1204 11:08:17.706792643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5220978Z 2025-12-04T11:11:26.5221481Z [W1204 11:08:17.795012745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5221486Z 2025-12-04T11:11:26.5221986Z [W1204 11:08:17.795805866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5221992Z 2025-12-04T11:11:26.5222503Z [W1204 11:08:17.796004798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5222510Z 2025-12-04T11:11:26.5223006Z [W1204 11:08:17.799969690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5223010Z 2025-12-04T11:11:26.5223517Z [W1204 11:08:17.800639396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5223522Z 2025-12-04T11:11:26.5224022Z [W1204 11:08:17.800840771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5224026Z 2025-12-04T11:11:26.5224536Z [W1204 11:08:17.806800310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5224542Z 2025-12-04T11:11:26.5225040Z [W1204 11:08:17.807635215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5225047Z 2025-12-04T11:11:26.5225561Z [W1204 11:08:17.807876956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5225566Z 2025-12-04T11:11:26.5225663Z FAILED [0.4121s] [100%] 2025-12-04T11:11:26.5225668Z 2025-12-04T11:11:26.5225811Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5226396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5226517Z Traceback (most recent call last): 2025-12-04T11:11:26.5227029Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5227290Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5227743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5227949Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5228477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5228680Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5228824Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5228829Z 2025-12-04T11:11:26.5228932Z Expected 1 but got 2. 2025-12-04T11:11:26.5229054Z Absolute difference: 1 2025-12-04T11:11:26.5229161Z Relative difference: 1.0 2025-12-04T11:11:26.5229165Z 2025-12-04T11:11:26.5229375Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5230272Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5230278Z 2025-12-04T11:11:26.5230541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5230772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5230885Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5231406Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5231642Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5231738Z graph_break [] 2025-12-04T11:11:26.5231953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5233142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5233259Z if out == self.unknown_value: 2025-12-04T11:11:26.5233984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5234083Z warnings.warn( 2025-12-04T11:11:26.5234782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5234892Z warnings.warn( 2025-12-04T11:11:26.5235391Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5235525Z Traceback (most recent call last): 2025-12-04T11:11:26.5236019Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5236248Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5236710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5236875Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5237411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5237617Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5237750Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5237756Z 2025-12-04T11:11:26.5237932Z Expected 1 but got 2. 2025-12-04T11:11:26.5238038Z Absolute difference: 1 2025-12-04T11:11:26.5238143Z Relative difference: 1.0 2025-12-04T11:11:26.5238148Z 2025-12-04T11:11:26.5238372Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5239280Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5239315Z 2025-12-04T11:11:26.5239592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5239805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5239918Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5240448Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5240676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5240788Z graph_break [] 2025-12-04T11:11:26.5240998Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5242241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5242373Z if out == self.unknown_value: 2025-12-04T11:11:26.5243079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5243192Z warnings.warn( 2025-12-04T11:11:26.5243897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5243997Z warnings.warn( 2025-12-04T11:11:26.5244224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5244336Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5244558Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5245091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5245188Z graph_break [] 2025-12-04T11:11:26.5245415Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5246126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5246223Z warnings.warn( 2025-12-04T11:11:26.5246940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5247038Z warnings.warn( 2025-12-04T11:11:26.5247179Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5247684Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T11:11:26.5247803Z Traceback (most recent call last): 2025-12-04T11:11:26.5248315Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5248543Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5248990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5249163Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5249892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5250115Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5250245Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5250252Z 2025-12-04T11:11:26.5250385Z Expected 1 but got 2. 2025-12-04T11:11:26.5250552Z Absolute difference: 1 2025-12-04T11:11:26.5250689Z Relative difference: 1.0 2025-12-04T11:11:26.5250693Z 2025-12-04T11:11:26.5250908Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5251931Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5251937Z 2025-12-04T11:11:26.5252202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5252431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5252555Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5253075Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5253313Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5253412Z graph_break [] 2025-12-04T11:11:26.5253636Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5254813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5254929Z if out == self.unknown_value: 2025-12-04T11:11:26.5255657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5255761Z warnings.warn( 2025-12-04T11:11:26.5256481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5256580Z warnings.warn( 2025-12-04T11:11:26.5256790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5256920Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5257144Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5257663Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5257775Z graph_break [] 2025-12-04T11:11:26.5257987Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5258711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5258807Z warnings.warn( 2025-12-04T11:11:26.5259511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5259625Z warnings.warn( 2025-12-04T11:11:26.5259832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5259947Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5260185Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5260701Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5260809Z graph_break [] 2025-12-04T11:11:26.5261019Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5261823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5261934Z warnings.warn( 2025-12-04T11:11:26.5262635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5262786Z warnings.warn( 2025-12-04T11:11:26.5263608Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml - 2025-12-04T11:11:26.5263829Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5264769Z FAILED [0.4121s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5264776Z 2025-12-04T11:11:26.5264882Z Expected 1 but got 2. 2025-12-04T11:11:26.5264999Z Absolute difference: 1 2025-12-04T11:11:26.5265104Z Relative difference: 1.0 2025-12-04T11:11:26.5265109Z 2025-12-04T11:11:26.5265321Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5266220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5266227Z 2025-12-04T11:11:26.5266491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5266684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5266877Z ================== 1 failed, 10 deselected, 2 rerun in 19.89s ================== 2025-12-04T11:11:26.5266973Z Got exit code 1 2025-12-04T11:11:26.5267788Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T11:11:26.5268190Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.5268629Z W1204 11:08:28.626000 92826 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5269294Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml 2025-12-04T11:11:26.5269459Z ============================= test session starts ============================== 2025-12-04T11:11:26.5269813Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5269922Z cachedir: .pytest_cache 2025-12-04T11:11:26.5270435Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5270575Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5270682Z configfile: pytest.ini 2025-12-04T11:11:26.5271227Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5271443Z collecting ... collected 58 items / 9 deselected / 49 selected 2025-12-04T11:11:26.5271582Z stepcurrent: skipping 9 already run items. 2025-12-04T11:11:26.5271709Z Running 2 items in this shard 2025-12-04T11:11:26.5271716Z 2025-12-04T11:11:26.5272568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8097s] [ 50%] 2025-12-04T11:11:26.5273490Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4874s] [ 50%] 2025-12-04T11:11:26.5274355Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4819s] [ 50%] 2025-12-04T11:11:26.5274361Z 2025-12-04T11:11:26.5274501Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5275043Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5275164Z Traceback (most recent call last): 2025-12-04T11:11:26.5275720Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5275952Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5276411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5276585Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5277118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5277339Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5277470Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5277475Z 2025-12-04T11:11:26.5277578Z Expected 1 but got 2. 2025-12-04T11:11:26.5277697Z Absolute difference: 1 2025-12-04T11:11:26.5277806Z Relative difference: 1.0 2025-12-04T11:11:26.5277811Z 2025-12-04T11:11:26.5278022Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5278930Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5278936Z 2025-12-04T11:11:26.5279199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5279435Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5279548Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5280066Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5280304Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5280401Z graph_break [] 2025-12-04T11:11:26.5280628Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5281347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5281446Z warnings.warn( 2025-12-04T11:11:26.5282242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5282347Z warnings.warn( 2025-12-04T11:11:26.5282852Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5282989Z Traceback (most recent call last): 2025-12-04T11:11:26.5283492Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5283734Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5284182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5284349Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5284892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5285095Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5285243Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5285343Z 2025-12-04T11:11:26.5285446Z Expected 1 but got 2. 2025-12-04T11:11:26.5285551Z Absolute difference: 1 2025-12-04T11:11:26.5285669Z Relative difference: 1.0 2025-12-04T11:11:26.5285674Z 2025-12-04T11:11:26.5285916Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5286806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5286855Z 2025-12-04T11:11:26.5287116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5287330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5287458Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5287979Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5288204Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5288311Z graph_break [] 2025-12-04T11:11:26.5288521Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5289253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5289350Z warnings.warn( 2025-12-04T11:11:26.5290061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5290170Z warnings.warn( 2025-12-04T11:11:26.5290387Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5290497Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5290735Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5291250Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5291358Z graph_break [] 2025-12-04T11:11:26.5291570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5292286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5292393Z warnings.warn( 2025-12-04T11:11:26.5293099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5293207Z warnings.warn( 2025-12-04T11:11:26.5293350Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5293853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5293984Z Traceback (most recent call last): 2025-12-04T11:11:26.5294489Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5294718Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5295181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5295343Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5295882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5296083Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5296212Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5296217Z 2025-12-04T11:11:26.5296332Z Expected 1 but got 2. 2025-12-04T11:11:26.5296587Z Absolute difference: 1 2025-12-04T11:11:26.5296697Z Relative difference: 1.0 2025-12-04T11:11:26.5296702Z 2025-12-04T11:11:26.5296928Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5297822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5298409Z 2025-12-04T11:11:26.5298691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5298943Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5299057Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5299592Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5299815Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5299928Z graph_break [] 2025-12-04T11:11:26.5300140Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5301042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5301160Z warnings.warn( 2025-12-04T11:11:26.5301870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5301984Z warnings.warn( 2025-12-04T11:11:26.5302194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5302304Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5302538Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5303059Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5303154Z graph_break [] 2025-12-04T11:11:26.5303379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5304085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5304197Z warnings.warn( 2025-12-04T11:11:26.5304897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5304994Z warnings.warn( 2025-12-04T11:11:26.5305217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5305327Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5305547Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5306081Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5306175Z graph_break [] 2025-12-04T11:11:26.5306400Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5307110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5307206Z warnings.warn( 2025-12-04T11:11:26.5307922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5308020Z warnings.warn( 2025-12-04T11:11:26.5308853Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml - 2025-12-04T11:11:26.5309168Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5310095Z FAILED [0.4819s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5310166Z 2025-12-04T11:11:26.5310288Z Expected 1 but got 2. 2025-12-04T11:11:26.5310394Z Absolute difference: 1 2025-12-04T11:11:26.5310503Z Relative difference: 1.0 2025-12-04T11:11:26.5310523Z 2025-12-04T11:11:26.5310782Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5311672Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5311677Z 2025-12-04T11:11:26.5311952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5312137Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5312345Z =================== 1 failed, 9 deselected, 2 rerun in 4.81s =================== 2025-12-04T11:11:26.5312440Z Got exit code 1 2025-12-04T11:11:26.5312546Z Retrying single test... 2025-12-04T11:11:26.5313002Z W1204 11:08:48.570000 93002 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5313650Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml 2025-12-04T11:11:26.5313812Z ============================= test session starts ============================== 2025-12-04T11:11:26.5314171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5314279Z cachedir: .pytest_cache 2025-12-04T11:11:26.5314806Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5314927Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5315030Z configfile: pytest.ini 2025-12-04T11:11:26.5315571Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5315788Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5316750Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5316876Z Running 1 items in this shard 2025-12-04T11:11:26.5316881Z 2025-12-04T11:11:26.5318133Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:08:52.791913890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5318139Z 2025-12-04T11:11:26.5318659Z [W1204 11:09:07.065180470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5318667Z 2025-12-04T11:11:26.5319166Z [W1204 11:09:07.065436755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5319197Z 2025-12-04T11:11:26.5319698Z [W1204 11:09:07.072816011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5319703Z 2025-12-04T11:11:26.5320200Z [W1204 11:09:07.073546411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5320205Z 2025-12-04T11:11:26.5320787Z [W1204 11:09:07.073731562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5320792Z 2025-12-04T11:11:26.5321290Z [W1204 11:09:07.080557560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5321324Z 2025-12-04T11:11:26.5321928Z [W1204 11:09:07.081221060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5321935Z 2025-12-04T11:11:26.5322437Z [W1204 11:09:07.081402320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5322487Z 2025-12-04T11:11:26.5322999Z [W1204 11:09:09.032476697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5323004Z 2025-12-04T11:11:26.5323507Z [W1204 11:09:09.034189229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5323511Z 2025-12-04T11:11:26.5324026Z [W1204 11:09:09.034391449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5324032Z 2025-12-04T11:11:26.5324528Z [W1204 11:09:09.038208488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5324533Z 2025-12-04T11:11:26.5325028Z [W1204 11:09:09.038842622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5325049Z 2025-12-04T11:11:26.5325549Z [W1204 11:09:09.039031517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5325553Z 2025-12-04T11:11:26.5326052Z [W1204 11:09:09.044913055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5326061Z 2025-12-04T11:11:26.5326574Z [W1204 11:09:09.045537624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5326579Z 2025-12-04T11:11:26.5327077Z [W1204 11:09:09.045725149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5327081Z 2025-12-04T11:11:26.5327228Z ('RERUN', {'yellow': True}) [19.1233s] [100%] 2025-12-04T11:11:26.5328479Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:09.482210351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5328487Z 2025-12-04T11:11:26.5329001Z [W1204 11:09:09.482985376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5329009Z 2025-12-04T11:11:26.5329512Z [W1204 11:09:09.483181610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5329517Z 2025-12-04T11:11:26.5330031Z [W1204 11:09:09.487043780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5330036Z 2025-12-04T11:11:26.5330531Z [W1204 11:09:09.487866062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5330538Z 2025-12-04T11:11:26.5331034Z [W1204 11:09:09.488052338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5331039Z 2025-12-04T11:11:26.5331552Z [W1204 11:09:09.494020881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5331556Z 2025-12-04T11:11:26.5332126Z [W1204 11:09:09.494706595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5332132Z 2025-12-04T11:11:26.5332646Z [W1204 11:09:09.494893338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5332688Z 2025-12-04T11:11:26.5333187Z [W1204 11:09:09.583668041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5333222Z 2025-12-04T11:11:26.5333734Z [W1204 11:09:09.584457827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5333739Z 2025-12-04T11:11:26.5334237Z [W1204 11:09:09.584660440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5334242Z 2025-12-04T11:11:26.5334756Z [W1204 11:09:09.588498667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5334761Z 2025-12-04T11:11:26.5335261Z [W1204 11:09:09.589130188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5335267Z 2025-12-04T11:11:26.5335765Z [W1204 11:09:09.589319533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5335784Z 2025-12-04T11:11:26.5336281Z [W1204 11:09:09.595198599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5336285Z 2025-12-04T11:11:26.5336784Z [W1204 11:09:09.596033506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5336789Z 2025-12-04T11:11:26.5337303Z [W1204 11:09:09.596222025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5337307Z 2025-12-04T11:11:26.5337434Z ('RERUN', {'yellow': True}) [0.5124s] [100%] 2025-12-04T11:11:26.5338701Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:10.974726090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5338709Z 2025-12-04T11:11:26.5339209Z [W1204 11:09:10.975503333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5339213Z 2025-12-04T11:11:26.5339727Z [W1204 11:09:10.975699747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5339732Z 2025-12-04T11:11:26.5340234Z [W1204 11:09:10.979621523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5340238Z 2025-12-04T11:11:26.5340754Z [W1204 11:09:10.980462289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5340761Z 2025-12-04T11:11:26.5341261Z [W1204 11:09:10.980655901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5341268Z 2025-12-04T11:11:26.5341762Z [W1204 11:09:10.986591408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5341778Z 2025-12-04T11:11:26.5342272Z [W1204 11:09:10.987279703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5342277Z 2025-12-04T11:11:26.5342841Z [W1204 11:09:10.987466204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5342846Z 2025-12-04T11:11:26.5343350Z [W1204 11:09:10.075543958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5343386Z 2025-12-04T11:11:26.5343884Z [W1204 11:09:10.076313662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5343888Z 2025-12-04T11:11:26.5344433Z [W1204 11:09:10.076511495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5344438Z 2025-12-04T11:11:26.5344932Z [W1204 11:09:10.080352593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5344936Z 2025-12-04T11:11:26.5345445Z [W1204 11:09:10.080978799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5345449Z 2025-12-04T11:11:26.5345950Z [W1204 11:09:10.081166826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5345956Z 2025-12-04T11:11:26.5346464Z [W1204 11:09:10.086946826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5346469Z 2025-12-04T11:11:26.5346966Z [W1204 11:09:10.087729460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5346972Z 2025-12-04T11:11:26.5347466Z [W1204 11:09:10.087916252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5347471Z 2025-12-04T11:11:26.5347580Z FAILED [0.4920s] [100%] 2025-12-04T11:11:26.5347585Z 2025-12-04T11:11:26.5347731Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5348248Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5348373Z Traceback (most recent call last): 2025-12-04T11:11:26.5348880Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5349119Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5349577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5349754Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5350286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5350488Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5350632Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5350638Z 2025-12-04T11:11:26.5350745Z Expected 1 but got 2. 2025-12-04T11:11:26.5350855Z Absolute difference: 1 2025-12-04T11:11:26.5350981Z Relative difference: 1.0 2025-12-04T11:11:26.5350988Z 2025-12-04T11:11:26.5351202Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5352105Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5352112Z 2025-12-04T11:11:26.5352375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5352590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5352721Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5353335Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5353575Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5353673Z graph_break [] 2025-12-04T11:11:26.5353884Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5355109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5355255Z if out == self.unknown_value: 2025-12-04T11:11:26.5356030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5356131Z warnings.warn( 2025-12-04T11:11:26.5356845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5356954Z warnings.warn( 2025-12-04T11:11:26.5357454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5357574Z Traceback (most recent call last): 2025-12-04T11:11:26.5358082Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5358310Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5358770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5358933Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5359458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5359680Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5359812Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5359817Z 2025-12-04T11:11:26.5359950Z Expected 1 but got 2. 2025-12-04T11:11:26.5360054Z Absolute difference: 1 2025-12-04T11:11:26.5360168Z Relative difference: 1.0 2025-12-04T11:11:26.5360173Z 2025-12-04T11:11:26.5360396Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5361280Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5361287Z 2025-12-04T11:11:26.5361628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5361848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5361962Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5362495Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5362718Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5362815Z graph_break [] 2025-12-04T11:11:26.5363046Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5364233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5364364Z if out == self.unknown_value: 2025-12-04T11:11:26.5365074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5365176Z warnings.warn( 2025-12-04T11:11:26.5365981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5366079Z warnings.warn( 2025-12-04T11:11:26.5366304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5366450Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5366670Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5367200Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5367334Z graph_break [] 2025-12-04T11:11:26.5367545Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5368269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5368366Z warnings.warn( 2025-12-04T11:11:26.5369082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5369177Z warnings.warn( 2025-12-04T11:11:26.5369320Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5369834Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5369952Z Traceback (most recent call last): 2025-12-04T11:11:26.5370450Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5370689Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5371138Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5371312Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5371841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5372042Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5372186Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5372191Z 2025-12-04T11:11:26.5372293Z Expected 1 but got 2. 2025-12-04T11:11:26.5372409Z Absolute difference: 1 2025-12-04T11:11:26.5372517Z Relative difference: 1.0 2025-12-04T11:11:26.5372524Z 2025-12-04T11:11:26.5372735Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5373632Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5373637Z 2025-12-04T11:11:26.5373898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5374127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5374240Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5374760Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5374998Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5375093Z graph_break [] 2025-12-04T11:11:26.5375303Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5376495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5376610Z if out == self.unknown_value: 2025-12-04T11:11:26.5377410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5377509Z warnings.warn( 2025-12-04T11:11:26.5378214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5378357Z warnings.warn( 2025-12-04T11:11:26.5378570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5378728Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5378954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5379467Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5379580Z graph_break [] 2025-12-04T11:11:26.5379792Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5380502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5380613Z warnings.warn( 2025-12-04T11:11:26.5381322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5381433Z warnings.warn( 2025-12-04T11:11:26.5381643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5381757Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5381994Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5382509Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5382605Z graph_break [] 2025-12-04T11:11:26.5382831Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5383536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5383648Z warnings.warn( 2025-12-04T11:11:26.5384350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5384447Z warnings.warn( 2025-12-04T11:11:26.5385280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml - 2025-12-04T11:11:26.5385450Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5386386Z FAILED [0.4920s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5386393Z 2025-12-04T11:11:26.5386495Z Expected 1 but got 2. 2025-12-04T11:11:26.5386597Z Absolute difference: 1 2025-12-04T11:11:26.5386714Z Relative difference: 1.0 2025-12-04T11:11:26.5386721Z 2025-12-04T11:11:26.5386931Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5387828Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5387835Z 2025-12-04T11:11:26.5388095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5388273Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5388481Z ================== 1 failed, 10 deselected, 2 rerun in 20.16s ================== 2025-12-04T11:11:26.5388575Z Got exit code 1 2025-12-04T11:11:26.5388761Z Retrying single test... 2025-12-04T11:11:26.5389203Z W1204 11:09:21.990000 93184 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5389848Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml 2025-12-04T11:11:26.5390056Z ============================= test session starts ============================== 2025-12-04T11:11:26.5390400Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5390538Z cachedir: .pytest_cache 2025-12-04T11:11:26.5391066Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5391188Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5391308Z configfile: pytest.ini 2025-12-04T11:11:26.5391842Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5392058Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5393043Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5393157Z Running 1 items in this shard 2025-12-04T11:11:26.5393162Z 2025-12-04T11:11:26.5394432Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:25.195906126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5394438Z 2025-12-04T11:11:26.5394948Z [W1204 11:09:40.398115658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5394953Z 2025-12-04T11:11:26.5395473Z [W1204 11:09:40.398378255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5395480Z 2025-12-04T11:11:26.5395981Z [W1204 11:09:40.405598016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5395986Z 2025-12-04T11:11:26.5396484Z [W1204 11:09:40.406345618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5396505Z 2025-12-04T11:11:26.5397004Z [W1204 11:09:40.406535097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5397009Z 2025-12-04T11:11:26.5397505Z [W1204 11:09:40.413450371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5397513Z 2025-12-04T11:11:26.5398031Z [W1204 11:09:40.414163491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5398036Z 2025-12-04T11:11:26.5398538Z [W1204 11:09:40.414347930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5398543Z 2025-12-04T11:11:26.5399055Z [W1204 11:09:42.359772077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5399062Z 2025-12-04T11:11:26.5399559Z [W1204 11:09:42.361583001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5399564Z 2025-12-04T11:11:26.5400078Z [W1204 11:09:42.361804725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5400083Z 2025-12-04T11:11:26.5400667Z [W1204 11:09:42.365880068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5400672Z 2025-12-04T11:11:26.5401389Z [W1204 11:09:42.366575751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5401564Z 2025-12-04T11:11:26.5402070Z [W1204 11:09:42.366770200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5402124Z 2025-12-04T11:11:26.5402624Z [W1204 11:09:42.372863303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5402643Z 2025-12-04T11:11:26.5403143Z [W1204 11:09:42.373555028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5403148Z 2025-12-04T11:11:26.5403654Z [W1204 11:09:42.373746518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5403659Z 2025-12-04T11:11:26.5403805Z ('RERUN', {'yellow': True}) [19.0303s] [100%] 2025-12-04T11:11:26.5405065Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:43.814019647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5405072Z 2025-12-04T11:11:26.5405595Z [W1204 11:09:43.814802242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5405599Z 2025-12-04T11:11:26.5406102Z [W1204 11:09:43.815000342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5406106Z 2025-12-04T11:11:26.5406623Z [W1204 11:09:43.818921691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5406628Z 2025-12-04T11:11:26.5407125Z [W1204 11:09:43.819733326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5407131Z 2025-12-04T11:11:26.5407632Z [W1204 11:09:43.819931608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5407656Z 2025-12-04T11:11:26.5408153Z [W1204 11:09:43.825921727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5408158Z 2025-12-04T11:11:26.5408662Z [W1204 11:09:43.826581911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5408666Z 2025-12-04T11:11:26.5409180Z [W1204 11:09:43.826768185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5409185Z 2025-12-04T11:11:26.5409685Z [W1204 11:09:43.916986640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5409692Z 2025-12-04T11:11:26.5410205Z [W1204 11:09:43.917791597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5410212Z 2025-12-04T11:11:26.5410713Z [W1204 11:09:43.917997570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5410718Z 2025-12-04T11:11:26.5411226Z [W1204 11:09:43.921991175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5411231Z 2025-12-04T11:11:26.5411818Z [W1204 11:09:43.922648392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5411823Z 2025-12-04T11:11:26.5412334Z [W1204 11:09:43.922840997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5412367Z 2025-12-04T11:11:26.5412864Z [W1204 11:09:43.928804275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5412868Z 2025-12-04T11:11:26.5413398Z [W1204 11:09:43.929616898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5413415Z 2025-12-04T11:11:26.5413911Z [W1204 11:09:43.929809630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5413916Z 2025-12-04T11:11:26.5414078Z ('RERUN', {'yellow': True}) [0.5175s] [100%] 2025-12-04T11:11:26.5415342Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:43.306775501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5415349Z 2025-12-04T11:11:26.5415855Z [W1204 11:09:43.307541547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5415861Z 2025-12-04T11:11:26.5416371Z [W1204 11:09:43.307739054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5416376Z 2025-12-04T11:11:26.5416873Z [W1204 11:09:43.311682413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5416878Z 2025-12-04T11:11:26.5417393Z [W1204 11:09:43.312488741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5417397Z 2025-12-04T11:11:26.5417894Z [W1204 11:09:43.312676250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5417900Z 2025-12-04T11:11:26.5418406Z [W1204 11:09:43.318675309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5418411Z 2025-12-04T11:11:26.5418909Z [W1204 11:09:43.319320204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5418913Z 2025-12-04T11:11:26.5419410Z [W1204 11:09:43.319505826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5419415Z 2025-12-04T11:11:26.5419931Z [W1204 11:09:43.408899566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5419935Z 2025-12-04T11:11:26.5420431Z [W1204 11:09:43.409718524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5420438Z 2025-12-04T11:11:26.5420945Z [W1204 11:09:43.409927851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5420950Z 2025-12-04T11:11:26.5421448Z [W1204 11:09:43.413952563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5421454Z 2025-12-04T11:11:26.5421965Z [W1204 11:09:43.414652950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5421969Z 2025-12-04T11:11:26.5422532Z [W1204 11:09:43.414850077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5422537Z 2025-12-04T11:11:26.5423054Z [W1204 11:09:43.420933415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5423092Z 2025-12-04T11:11:26.5423589Z [W1204 11:09:43.421808858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5423594Z 2025-12-04T11:11:26.5424089Z [W1204 11:09:43.422005226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5424137Z 2025-12-04T11:11:26.5424239Z FAILED [0.4931s] [100%] 2025-12-04T11:11:26.5424244Z 2025-12-04T11:11:26.5424386Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5424899Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5425025Z Traceback (most recent call last): 2025-12-04T11:11:26.5425528Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5425767Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5426224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5426396Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5426923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5427125Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5427267Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5427271Z 2025-12-04T11:11:26.5427375Z Expected 1 but got 2. 2025-12-04T11:11:26.5427490Z Absolute difference: 1 2025-12-04T11:11:26.5427598Z Relative difference: 1.0 2025-12-04T11:11:26.5427606Z 2025-12-04T11:11:26.5427816Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5428723Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5428731Z 2025-12-04T11:11:26.5428990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5429220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5429333Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5429855Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5430086Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5430182Z graph_break [] 2025-12-04T11:11:26.5430399Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5431599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5431714Z if out == self.unknown_value: 2025-12-04T11:11:26.5432439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5432540Z warnings.warn( 2025-12-04T11:11:26.5433243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5433353Z warnings.warn( 2025-12-04T11:11:26.5433916Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5434051Z Traceback (most recent call last): 2025-12-04T11:11:26.5434545Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5434802Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5435263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5435424Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5435996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5436214Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5436344Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5436349Z 2025-12-04T11:11:26.5436470Z Expected 1 but got 2. 2025-12-04T11:11:26.5436578Z Absolute difference: 1 2025-12-04T11:11:26.5436690Z Relative difference: 1.0 2025-12-04T11:11:26.5436695Z 2025-12-04T11:11:26.5436917Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5437807Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5437814Z 2025-12-04T11:11:26.5438090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5438305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5438419Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5438952Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5439175Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5439271Z graph_break [] 2025-12-04T11:11:26.5439502Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5440678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5440805Z if out == self.unknown_value: 2025-12-04T11:11:26.5441608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5441713Z warnings.warn( 2025-12-04T11:11:26.5442433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5442531Z warnings.warn( 2025-12-04T11:11:26.5442762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5442875Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5443097Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5443631Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5443728Z graph_break [] 2025-12-04T11:11:26.5443939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5444666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5444762Z warnings.warn( 2025-12-04T11:11:26.5445478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5445574Z warnings.warn( 2025-12-04T11:11:26.5445794Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5446311Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T11:11:26.5446459Z Traceback (most recent call last): 2025-12-04T11:11:26.5446970Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5447196Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5447673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5447848Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5448370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5448575Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5448714Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5448720Z 2025-12-04T11:11:26.5448822Z Expected 1 but got 2. 2025-12-04T11:11:26.5448937Z Absolute difference: 1 2025-12-04T11:11:26.5449044Z Relative difference: 1.0 2025-12-04T11:11:26.5449050Z 2025-12-04T11:11:26.5449259Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5450156Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5450164Z 2025-12-04T11:11:26.5450426Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5450653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5450763Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5451283Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5451517Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5451613Z graph_break [] 2025-12-04T11:11:26.5451830Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5453028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5453150Z if out == self.unknown_value: 2025-12-04T11:11:26.5453879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5453977Z warnings.warn( 2025-12-04T11:11:26.5454688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5454801Z warnings.warn( 2025-12-04T11:11:26.5455014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5455142Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5455364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5455879Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5455991Z graph_break [] 2025-12-04T11:11:26.5456202Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5456922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5457022Z warnings.warn( 2025-12-04T11:11:26.5457787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5457904Z warnings.warn( 2025-12-04T11:11:26.5458144Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5458253Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5458485Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5458998Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)] 2025-12-04T11:11:26.5459136Z graph_break [] 2025-12-04T11:11:26.5459346Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5460053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5460168Z warnings.warn( 2025-12-04T11:11:26.5460874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5460976Z warnings.warn( 2025-12-04T11:11:26.5461819Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml - 2025-12-04T11:11:26.5461987Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5462926Z FAILED [0.4931s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5462932Z 2025-12-04T11:11:26.5463039Z Expected 1 but got 2. 2025-12-04T11:11:26.5463144Z Absolute difference: 1 2025-12-04T11:11:26.5463274Z Relative difference: 1.0 2025-12-04T11:11:26.5463279Z 2025-12-04T11:11:26.5463493Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5464402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5464409Z 2025-12-04T11:11:26.5464675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5464855Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5465065Z ================== 1 failed, 10 deselected, 2 rerun in 20.07s ================== 2025-12-04T11:11:26.5465168Z Got exit code 1 2025-12-04T11:11:26.5465988Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T11:11:26.5466393Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.5466836Z W1204 11:09:55.242000 93366 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5467495Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml 2025-12-04T11:11:26.5467659Z ============================= test session starts ============================== 2025-12-04T11:11:26.5468020Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5468130Z cachedir: .pytest_cache 2025-12-04T11:11:26.5468640Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5468809Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5468915Z configfile: pytest.ini 2025-12-04T11:11:26.5469512Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5469746Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5469915Z stepcurrent: skipping 10 already run items. 2025-12-04T11:11:26.5470043Z Running 1 items in this shard 2025-12-04T11:11:26.5470048Z 2025-12-04T11:11:26.5470892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.0462s] [100%] 2025-12-04T11:11:26.5471852Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4497s] [100%] 2025-12-04T11:11:26.5472628Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4494s] [100%] 2025-12-04T11:11:26.5472634Z 2025-12-04T11:11:26.5472773Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5473275Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5473398Z Traceback (most recent call last): 2025-12-04T11:11:26.5473913Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5474144Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5474592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5474766Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5475295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5475498Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5475644Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5475649Z 2025-12-04T11:11:26.5475756Z Expected 1 but got 2. 2025-12-04T11:11:26.5475873Z Absolute difference: 1 2025-12-04T11:11:26.5475977Z Relative difference: 1.0 2025-12-04T11:11:26.5475982Z 2025-12-04T11:11:26.5476193Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5477082Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5477089Z 2025-12-04T11:11:26.5477351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5477576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5477691Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5478558Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5478793Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5478888Z graph_break [] 2025-12-04T11:11:26.5479098Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5479833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5479931Z warnings.warn( 2025-12-04T11:11:26.5480650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5480748Z warnings.warn( 2025-12-04T11:11:26.5481302Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5481435Z Traceback (most recent call last): 2025-12-04T11:11:26.5482008Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5482323Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5482775Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5482969Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5483509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5483710Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5483836Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5483855Z 2025-12-04T11:11:26.5483963Z Expected 1 but got 2. 2025-12-04T11:11:26.5484066Z Absolute difference: 1 2025-12-04T11:11:26.5484191Z Relative difference: 1.0 2025-12-04T11:11:26.5484196Z 2025-12-04T11:11:26.5484407Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5485287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5485294Z 2025-12-04T11:11:26.5485569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5485783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5485909Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5486781Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5487003Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5487113Z graph_break [] 2025-12-04T11:11:26.5487325Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5488054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5488153Z warnings.warn( 2025-12-04T11:11:26.5488857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5488965Z warnings.warn( 2025-12-04T11:11:26.5489179Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5489288Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5489528Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5490400Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5490511Z graph_break [] 2025-12-04T11:11:26.5490721Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5491428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5491540Z warnings.warn( 2025-12-04T11:11:26.5492246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5492354Z warnings.warn( 2025-12-04T11:11:26.5492554Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5493051Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5493185Z Traceback (most recent call last): 2025-12-04T11:11:26.5493719Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5493947Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5494406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5494597Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5495133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5495334Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5495463Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5495472Z 2025-12-04T11:11:26.5495589Z Expected 1 but got 2. 2025-12-04T11:11:26.5495693Z Absolute difference: 1 2025-12-04T11:11:26.5495799Z Relative difference: 1.0 2025-12-04T11:11:26.5495816Z 2025-12-04T11:11:26.5496028Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5496906Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5496914Z 2025-12-04T11:11:26.5497189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5497403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5497515Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5498395Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5498616Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5498725Z graph_break [] 2025-12-04T11:11:26.5498940Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5499655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5499766Z warnings.warn( 2025-12-04T11:11:26.5500473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5500585Z warnings.warn( 2025-12-04T11:11:26.5500798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5501157Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5501399Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5502268Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5502381Z graph_break [] 2025-12-04T11:11:26.5502592Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5503301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5503416Z warnings.warn( 2025-12-04T11:11:26.5504115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5504211Z warnings.warn( 2025-12-04T11:11:26.5504590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5504704Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5504944Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5505851Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5505948Z graph_break [] 2025-12-04T11:11:26.5506226Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5506935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5507045Z warnings.warn( 2025-12-04T11:11:26.5507756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5507851Z warnings.warn( 2025-12-04T11:11:26.5508681Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml - 2025-12-04T11:11:26.5508852Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5509761Z FAILED [0.4494s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5509781Z 2025-12-04T11:11:26.5509883Z Expected 1 but got 2. 2025-12-04T11:11:26.5509985Z Absolute difference: 1 2025-12-04T11:11:26.5510102Z Relative difference: 1.0 2025-12-04T11:11:26.5510107Z 2025-12-04T11:11:26.5510318Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5511202Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5511219Z 2025-12-04T11:11:26.5511484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5511662Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5511868Z ================== 1 failed, 10 deselected, 2 rerun in 4.98s =================== 2025-12-04T11:11:26.5511965Z Got exit code 1 2025-12-04T11:11:26.5512070Z Retrying single test... 2025-12-04T11:11:26.5512523Z W1204 11:10:15.372000 93562 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5513173Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml 2025-12-04T11:11:26.5513352Z ============================= test session starts ============================== 2025-12-04T11:11:26.5513695Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5513802Z cachedir: .pytest_cache 2025-12-04T11:11:26.5514329Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5514450Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5514554Z configfile: pytest.ini 2025-12-04T11:11:26.5515097Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5515322Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5516300Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5516472Z Running 1 items in this shard 2025-12-04T11:11:26.5516477Z 2025-12-04T11:11:26.5517720Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:21.669252805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5517770Z 2025-12-04T11:11:26.5518283Z [W1204 11:10:36.580901945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5518319Z 2025-12-04T11:11:26.5518825Z [W1204 11:10:36.581156043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5518843Z 2025-12-04T11:11:26.5519348Z [W1204 11:10:36.588359578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5519353Z 2025-12-04T11:11:26.5519858Z [W1204 11:10:36.589085609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5519863Z 2025-12-04T11:11:26.5520378Z [W1204 11:10:36.589325368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5520383Z 2025-12-04T11:11:26.5520883Z [W1204 11:10:36.596324120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5520891Z 2025-12-04T11:11:26.5521404Z [W1204 11:10:36.596982002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5521409Z 2025-12-04T11:11:26.5521973Z [W1204 11:10:36.597164018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5521979Z 2025-12-04T11:11:26.5522495Z [W1204 11:10:37.733489534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5522499Z 2025-12-04T11:11:26.5522997Z [W1204 11:10:37.735277807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5523004Z 2025-12-04T11:11:26.5523515Z [W1204 11:10:37.735484020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5523522Z 2025-12-04T11:11:26.5524018Z [W1204 11:10:37.739483708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5524022Z 2025-12-04T11:11:26.5524519Z [W1204 11:10:37.740280866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5524524Z 2025-12-04T11:11:26.5525040Z [W1204 11:10:37.740501741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5525044Z 2025-12-04T11:11:26.5525542Z [W1204 11:10:37.746585813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5525549Z 2025-12-04T11:11:26.5526063Z [W1204 11:10:37.747284312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5526069Z 2025-12-04T11:11:26.5526563Z [W1204 11:10:37.747481151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5526568Z 2025-12-04T11:11:26.5526712Z ('RERUN', {'yellow': True}) [20.0187s] [100%] 2025-12-04T11:11:26.5528035Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:37.157497058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5528041Z 2025-12-04T11:11:26.5528556Z [W1204 11:10:37.158311954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5528591Z 2025-12-04T11:11:26.5529093Z [W1204 11:10:37.158510427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5529128Z 2025-12-04T11:11:26.5529629Z [W1204 11:10:37.162653752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5529649Z 2025-12-04T11:11:26.5530146Z [W1204 11:10:37.163322911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5530151Z 2025-12-04T11:11:26.5530657Z [W1204 11:10:37.163513297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5530663Z 2025-12-04T11:11:26.5531177Z [W1204 11:10:37.169600099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5531183Z 2025-12-04T11:11:26.5531680Z [W1204 11:10:37.170349961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5531685Z 2025-12-04T11:11:26.5532200Z [W1204 11:10:37.170542399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5532205Z 2025-12-04T11:11:26.5532699Z [W1204 11:10:37.260154908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5532705Z 2025-12-04T11:11:26.5533222Z [W1204 11:10:37.260951843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5533227Z 2025-12-04T11:11:26.5533724Z [W1204 11:10:37.261158272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5533731Z 2025-12-04T11:11:26.5534246Z [W1204 11:10:37.265119827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5534251Z 2025-12-04T11:11:26.5534749Z [W1204 11:10:37.265778382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5534757Z 2025-12-04T11:11:26.5535254Z [W1204 11:10:37.265969013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5535272Z 2025-12-04T11:11:26.5535776Z [W1204 11:10:37.271996753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5535781Z 2025-12-04T11:11:26.5536280Z [W1204 11:10:37.272845146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5536287Z 2025-12-04T11:11:26.5536802Z [W1204 11:10:37.273037451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5536807Z 2025-12-04T11:11:26.5536938Z ('RERUN', {'yellow': True}) [0.4856s] [100%] 2025-12-04T11:11:26.5538194Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:37.616179972 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5538199Z 2025-12-04T11:11:26.5538758Z [W1204 11:10:37.616938580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5538763Z 2025-12-04T11:11:26.5539280Z [W1204 11:10:37.617136425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5539312Z 2025-12-04T11:11:26.5539928Z [W1204 11:10:38.621229329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5539933Z 2025-12-04T11:11:26.5540433Z [W1204 11:10:38.621876317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5540489Z 2025-12-04T11:11:26.5541067Z [W1204 11:10:38.622064129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5541072Z 2025-12-04T11:11:26.5541656Z [W1204 11:10:38.628196395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5541661Z 2025-12-04T11:11:26.5542178Z [W1204 11:10:38.628875263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5542185Z 2025-12-04T11:11:26.5542684Z [W1204 11:10:38.629062907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5542689Z 2025-12-04T11:11:26.5543203Z [W1204 11:10:38.718976126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5543210Z 2025-12-04T11:11:26.5543708Z [W1204 11:10:38.719755108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5543713Z 2025-12-04T11:11:26.5544227Z [W1204 11:10:38.719958715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5544235Z 2025-12-04T11:11:26.5544732Z [W1204 11:10:38.723872454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5544737Z 2025-12-04T11:11:26.5545254Z [W1204 11:10:38.724502026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5545259Z 2025-12-04T11:11:26.5545756Z [W1204 11:10:38.724691670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5545763Z 2025-12-04T11:11:26.5546260Z [W1204 11:10:38.730638384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5546279Z 2025-12-04T11:11:26.5546778Z [W1204 11:10:38.731426634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5546783Z 2025-12-04T11:11:26.5547280Z [W1204 11:10:38.731616363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5547285Z 2025-12-04T11:11:26.5547395Z FAILED [0.4556s] [100%] 2025-12-04T11:11:26.5547402Z 2025-12-04T11:11:26.5547545Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5548053Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5548179Z Traceback (most recent call last): 2025-12-04T11:11:26.5548679Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5548920Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5549374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5549616Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5550158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5550358Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5550530Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5550536Z 2025-12-04T11:11:26.5550638Z Expected 1 but got 2. 2025-12-04T11:11:26.5550740Z Absolute difference: 1 2025-12-04T11:11:26.5550862Z Relative difference: 1.0 2025-12-04T11:11:26.5550899Z 2025-12-04T11:11:26.5551108Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5552006Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5552012Z 2025-12-04T11:11:26.5552279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5552496Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5552621Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5553489Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5553726Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5553823Z graph_break [] 2025-12-04T11:11:26.5554036Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5555224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5555340Z if out == self.unknown_value: 2025-12-04T11:11:26.5556065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5556170Z warnings.warn( 2025-12-04T11:11:26.5556999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5557112Z warnings.warn( 2025-12-04T11:11:26.5557608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5557728Z Traceback (most recent call last): 2025-12-04T11:11:26.5558238Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5558466Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5558933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5559093Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5559617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5559837Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5559966Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5559972Z 2025-12-04T11:11:26.5560092Z Expected 1 but got 2. 2025-12-04T11:11:26.5560195Z Absolute difference: 1 2025-12-04T11:11:26.5560304Z Relative difference: 1.0 2025-12-04T11:11:26.5560308Z 2025-12-04T11:11:26.5560531Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5561548Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5561556Z 2025-12-04T11:11:26.5561826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5562056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5562202Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5563084Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5563339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5563434Z graph_break [] 2025-12-04T11:11:26.5563663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5564889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5565018Z if out == self.unknown_value: 2025-12-04T11:11:26.5565727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5565826Z warnings.warn( 2025-12-04T11:11:26.5566541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5566641Z warnings.warn( 2025-12-04T11:11:26.5566866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5566976Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5567200Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5568085Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5568181Z graph_break [] 2025-12-04T11:11:26.5568391Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5569112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5569211Z warnings.warn( 2025-12-04T11:11:26.5569932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5570027Z warnings.warn( 2025-12-04T11:11:26.5570168Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5570682Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5570801Z Traceback (most recent call last): 2025-12-04T11:11:26.5571311Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5571543Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5571992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5572166Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5572694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5572896Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5573039Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5573044Z 2025-12-04T11:11:26.5573148Z Expected 1 but got 2. 2025-12-04T11:11:26.5573266Z Absolute difference: 1 2025-12-04T11:11:26.5573470Z Relative difference: 1.0 2025-12-04T11:11:26.5573475Z 2025-12-04T11:11:26.5573685Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5574583Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5574619Z 2025-12-04T11:11:26.5574881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5575140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5575253Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5576126Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5576364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5576460Z graph_break [] 2025-12-04T11:11:26.5576671Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5577865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5577982Z if out == self.unknown_value: 2025-12-04T11:11:26.5578707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5578804Z warnings.warn( 2025-12-04T11:11:26.5579508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5579616Z warnings.warn( 2025-12-04T11:11:26.5579832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5579956Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5580180Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5581046Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5581154Z graph_break [] 2025-12-04T11:11:26.5581365Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5582085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5582181Z warnings.warn( 2025-12-04T11:11:26.5582888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5582996Z warnings.warn( 2025-12-04T11:11:26.5583207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5583320Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5583555Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5584423Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5584530Z graph_break [] 2025-12-04T11:11:26.5584738Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5585447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5585631Z warnings.warn( 2025-12-04T11:11:26.5586334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5586471Z warnings.warn( 2025-12-04T11:11:26.5587296Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml - 2025-12-04T11:11:26.5587464Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5588422Z FAILED [0.4556s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5588427Z 2025-12-04T11:11:26.5588529Z Expected 1 but got 2. 2025-12-04T11:11:26.5588648Z Absolute difference: 1 2025-12-04T11:11:26.5588756Z Relative difference: 1.0 2025-12-04T11:11:26.5588766Z 2025-12-04T11:11:26.5588981Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5589879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5589887Z 2025-12-04T11:11:26.5590149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5590343Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5590540Z ================== 1 failed, 10 deselected, 2 rerun in 20.99s ================== 2025-12-04T11:11:26.5590638Z Got exit code 1 2025-12-04T11:11:26.5590758Z Retrying single test... 2025-12-04T11:11:26.5591198Z W1204 11:10:49.546000 93763 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5591847Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml 2025-12-04T11:11:26.5592024Z ============================= test session starts ============================== 2025-12-04T11:11:26.5592372Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5592493Z cachedir: .pytest_cache 2025-12-04T11:11:26.5593002Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5593126Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5593244Z configfile: pytest.ini 2025-12-04T11:11:26.5593774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5593987Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T11:11:26.5594964Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5595076Z Running 1 items in this shard 2025-12-04T11:11:26.5595083Z 2025-12-04T11:11:26.5596335Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:55.821215965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5596343Z 2025-12-04T11:11:26.5596850Z [W1204 11:11:10.292030303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5596855Z 2025-12-04T11:11:26.5597368Z [W1204 11:11:10.292286372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5597374Z 2025-12-04T11:11:26.5597933Z [W1204 11:11:10.299611925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5597938Z 2025-12-04T11:11:26.5598450Z [W1204 11:11:10.300372440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5598489Z 2025-12-04T11:11:26.5598991Z [W1204 11:11:10.300569683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5599025Z 2025-12-04T11:11:26.5599539Z [W1204 11:11:10.307707967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5599543Z 2025-12-04T11:11:26.5600038Z [W1204 11:11:10.308398037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5600043Z 2025-12-04T11:11:26.5600543Z [W1204 11:11:10.308584615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5600574Z 2025-12-04T11:11:26.5601286Z [W1204 11:11:10.443276451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5601294Z 2025-12-04T11:11:26.5601851Z [W1204 11:11:10.445016863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5601859Z 2025-12-04T11:11:26.5602377Z [W1204 11:11:10.445223560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5602382Z 2025-12-04T11:11:26.5602877Z [W1204 11:11:10.449091345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5602882Z 2025-12-04T11:11:26.5603398Z [W1204 11:11:10.449742303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5603403Z 2025-12-04T11:11:26.5603897Z [W1204 11:11:10.449936459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5603903Z 2025-12-04T11:11:26.5604412Z [W1204 11:11:10.455890000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5604419Z 2025-12-04T11:11:26.5604918Z [W1204 11:11:10.456532181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5604923Z 2025-12-04T11:11:26.5605421Z [W1204 11:11:10.456720159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5605440Z 2025-12-04T11:11:26.5605570Z ('RERUN', {'yellow': True}) [19.5562s] [100%] 2025-12-04T11:11:26.5606822Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:11:11.860333884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5606830Z 2025-12-04T11:11:26.5607343Z [W1204 11:11:11.861078100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5607350Z 2025-12-04T11:11:26.5607851Z [W1204 11:11:11.861273815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5607855Z 2025-12-04T11:11:26.5608372Z [W1204 11:11:11.865252509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5608376Z 2025-12-04T11:11:26.5609020Z [W1204 11:11:11.865889876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5609026Z 2025-12-04T11:11:26.5609539Z [W1204 11:11:11.866076822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5609589Z 2025-12-04T11:11:26.5610090Z [W1204 11:11:11.872191154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5610094Z 2025-12-04T11:11:26.5610662Z [W1204 11:11:11.872829745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5610667Z 2025-12-04T11:11:26.5611165Z [W1204 11:11:11.873013405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5611170Z 2025-12-04T11:11:26.5611671Z [W1204 11:11:11.961162884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5611690Z 2025-12-04T11:11:26.5612188Z [W1204 11:11:11.961921465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5612195Z 2025-12-04T11:11:26.5612691Z [W1204 11:11:11.962124034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5612696Z 2025-12-04T11:11:26.5613212Z [W1204 11:11:11.965964733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5613218Z 2025-12-04T11:11:26.5613720Z [W1204 11:11:11.966605255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5613724Z 2025-12-04T11:11:26.5614240Z [W1204 11:11:11.966794595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5614246Z 2025-12-04T11:11:26.5614743Z [W1204 11:11:11.972726591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5614750Z 2025-12-04T11:11:26.5615261Z [W1204 11:11:11.973523700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5615266Z 2025-12-04T11:11:26.5615762Z [W1204 11:11:11.973713672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5615769Z 2025-12-04T11:11:26.5615913Z ('RERUN', {'yellow': True}) [0.4788s] [100%] 2025-12-04T11:11:26.5617156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:11:11.313517402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5617162Z 2025-12-04T11:11:26.5617660Z [W1204 11:11:11.314256682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5617667Z 2025-12-04T11:11:26.5618180Z [W1204 11:11:11.314454074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5618185Z 2025-12-04T11:11:26.5618684Z [W1204 11:11:11.318404189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5618691Z 2025-12-04T11:11:26.5619198Z [W1204 11:11:11.319037525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5619202Z 2025-12-04T11:11:26.5619770Z [W1204 11:11:11.319225803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5619775Z 2025-12-04T11:11:26.5620294Z [W1204 11:11:11.325315639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5620331Z 2025-12-04T11:11:26.5620829Z [W1204 11:11:11.325963406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5620834Z 2025-12-04T11:11:26.5621344Z [W1204 11:11:11.326160273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5621381Z 2025-12-04T11:11:26.5621877Z [W1204 11:11:11.412730046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5621882Z 2025-12-04T11:11:26.5622381Z [W1204 11:11:11.413521404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5622402Z 2025-12-04T11:11:26.5622897Z [W1204 11:11:11.413727614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5622901Z 2025-12-04T11:11:26.5623400Z [W1204 11:11:11.417674313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5623405Z 2025-12-04T11:11:26.5623911Z [W1204 11:11:11.418344760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5623918Z 2025-12-04T11:11:26.5624417Z [W1204 11:11:11.418539760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5624421Z 2025-12-04T11:11:26.5624931Z [W1204 11:11:11.424531634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5624936Z 2025-12-04T11:11:26.5625438Z [W1204 11:11:11.425383528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5625442Z 2025-12-04T11:11:26.5625950Z [W1204 11:11:11.425575465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:11:26.5625957Z 2025-12-04T11:11:26.5626053Z FAILED [0.4510s] [100%] 2025-12-04T11:11:26.5626058Z 2025-12-04T11:11:26.5626197Z ==================================== RERUNS ==================================== 2025-12-04T11:11:26.5626714Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5626834Z Traceback (most recent call last): 2025-12-04T11:11:26.5627352Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5627579Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5628036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5628212Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5628742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5628961Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5629093Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5629100Z 2025-12-04T11:11:26.5629202Z Expected 1 but got 2. 2025-12-04T11:11:26.5629323Z Absolute difference: 1 2025-12-04T11:11:26.5629433Z Relative difference: 1.0 2025-12-04T11:11:26.5629438Z 2025-12-04T11:11:26.5629650Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5630621Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5630627Z 2025-12-04T11:11:26.5630892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5631150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5631262Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5632135Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5632407Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5632506Z graph_break [] 2025-12-04T11:11:26.5632735Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5633925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5634039Z if out == self.unknown_value: 2025-12-04T11:11:26.5634767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5634867Z warnings.warn( 2025-12-04T11:11:26.5635582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5635680Z warnings.warn( 2025-12-04T11:11:26.5636172Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5636300Z Traceback (most recent call last): 2025-12-04T11:11:26.5636800Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5637025Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5637486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5637647Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5638181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5638387Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5638517Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5638522Z 2025-12-04T11:11:26.5638636Z Expected 1 but got 2. 2025-12-04T11:11:26.5638739Z Absolute difference: 1 2025-12-04T11:11:26.5638854Z Relative difference: 1.0 2025-12-04T11:11:26.5638859Z 2025-12-04T11:11:26.5639069Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5639956Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5639961Z 2025-12-04T11:11:26.5640239Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5640457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5640585Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5641514Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5641745Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5641861Z graph_break [] 2025-12-04T11:11:26.5642075Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5643350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5643494Z if out == self.unknown_value: 2025-12-04T11:11:26.5644205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5644353Z warnings.warn( 2025-12-04T11:11:26.5645058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5645154Z warnings.warn( 2025-12-04T11:11:26.5645384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5645495Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5645734Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5646602Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5646697Z graph_break [] 2025-12-04T11:11:26.5646921Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5647629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5647742Z warnings.warn( 2025-12-04T11:11:26.5648442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5648537Z warnings.warn( 2025-12-04T11:11:26.5648693Z =================================== FAILURES =================================== 2025-12-04T11:11:26.5649185Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T11:11:26.5649305Z Traceback (most recent call last): 2025-12-04T11:11:26.5649813Z File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda 2025-12-04T11:11:26.5650039Z self.assertEqual(counters["inductor"]["woq_matcher_count"], 1) 2025-12-04T11:11:26.5650503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual 2025-12-04T11:11:26.5650663Z return super().assertEqual(x, y, *args, **kwargs) 2025-12-04T11:11:26.5651189Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual 2025-12-04T11:11:26.5651402Z raise error_metas.pop()[0].to_error( # type: ignore[index] 2025-12-04T11:11:26.5651535Z AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5651541Z 2025-12-04T11:11:26.5651654Z Expected 1 but got 2. 2025-12-04T11:11:26.5651758Z Absolute difference: 1 2025-12-04T11:11:26.5651863Z Relative difference: 1.0 2025-12-04T11:11:26.5651870Z 2025-12-04T11:11:26.5652093Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5652972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5652980Z 2025-12-04T11:11:26.5653244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5653468Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5653579Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5654521Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5654745Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5654959Z graph_break [] 2025-12-04T11:11:26.5655184Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5656359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:11:26.5656520Z if out == self.unknown_value: 2025-12-04T11:11:26.5657227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5657323Z warnings.warn( 2025-12-04T11:11:26.5658048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5658144Z warnings.warn( 2025-12-04T11:11:26.5658369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5658482Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5658704Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5659632Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5659731Z graph_break [] 2025-12-04T11:11:26.5659943Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5660665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5660762Z warnings.warn( 2025-12-04T11:11:26.5661479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5661577Z warnings.warn( 2025-12-04T11:11:26.5661787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:11:26.5661913Z stats [('calls_captured', 6)] 2025-12-04T11:11:26.5662136Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)] 2025-12-04T11:11:26.5663016Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)] 2025-12-04T11:11:26.5663110Z graph_break [] 2025-12-04T11:11:26.5663320Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:11:26.5664049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5664150Z warnings.warn( 2025-12-04T11:11:26.5664852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:11:26.5664960Z warnings.warn( 2025-12-04T11:11:26.5665776Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml - 2025-12-04T11:11:26.5665963Z =========================== short test summary info ============================ 2025-12-04T11:11:26.5666939Z FAILED [0.4510s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal! 2025-12-04T11:11:26.5666946Z 2025-12-04T11:11:26.5667061Z Expected 1 but got 2. 2025-12-04T11:11:26.5667166Z Absolute difference: 1 2025-12-04T11:11:26.5667273Z Relative difference: 1.0 2025-12-04T11:11:26.5667278Z 2025-12-04T11:11:26.5667540Z To execute this test, run the following from the base repo dir: 2025-12-04T11:11:26.5668421Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5668458Z 2025-12-04T11:11:26.5668723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:11:26.5668913Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:11:26.5669107Z ================== 1 failed, 10 deselected, 2 rerun in 20.52s ================== 2025-12-04T11:11:26.5669213Z Got exit code 1 2025-12-04T11:11:26.5670023Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T11:11:26.5670427Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:11:26.5670879Z W1204 11:11:23.200000 93964 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:11:26.5671526Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml 2025-12-04T11:11:26.5671705Z ============================= test session starts ============================== 2025-12-04T11:11:26.5672048Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:11:26.5672154Z cachedir: .pytest_cache 2025-12-04T11:11:26.5672677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:11:26.5672798Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:11:26.5672903Z configfile: pytest.ini 2025-12-04T11:11:26.5673447Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:11:26.5673664Z collecting ... collected 58 items / 11 deselected / 47 selected 2025-12-04T11:11:26.5673817Z stepcurrent: skipping 11 already run items. 2025-12-04T11:11:26.5673928Z Running 0 items in this shard 2025-12-04T11:11:26.5673935Z 2025-12-04T11:11:26.5674768Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml - 2025-12-04T11:11:26.5674947Z ============================ 11 deselected in 0.02s ============================ 2025-12-04T11:11:26.5683547Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16'] 2025-12-04T11:11:26.5683613Z 2025-12-04T11:11:26.5684251Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 4/5 (test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log) 2025-12-04T11:11:26.5684257Z 2025-12-04T11:11:26.5684658Z Finished inductor/test_cuda_select_algorithm 4/5 ... [2025-12-04 11:11:26.269401][7043.879300637], took 16.03min 2025-12-04T11:11:26.5685540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml 2025-12-04T11:11:26.5686434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml 2025-12-04T11:11:26.5687309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml 2025-12-04T11:11:26.5688193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml 2025-12-04T11:11:26.5689067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml 2025-12-04T11:11:26.5689986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml 2025-12-04T11:11:26.5690854Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml 2025-12-04T11:11:26.5960380Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml 2025-12-04T11:11:26.6301239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml 2025-12-04T11:11:26.6615752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml 2025-12-04T11:11:26.6966662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml 2025-12-04T11:11:26.7282693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml 2025-12-04T11:11:26.7578162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml 2025-12-04T11:11:26.7929962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml 2025-12-04T11:11:26.8239319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml 2025-12-04T11:11:26.8560350Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml 2025-12-04T11:11:26.8876953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml 2025-12-04T11:11:26.9201296Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml 2025-12-04T11:11:26.9678268Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml 2025-12-04T11:11:26.9992832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml 2025-12-04T11:11:27.0329314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml 2025-12-04T11:11:27.0596937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml 2025-12-04T11:11:27.0915220Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml 2025-12-04T11:11:27.1221571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml 2025-12-04T11:11:27.2064848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml 2025-12-04T11:11:27.2377141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml 2025-12-04T11:11:27.2699223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml 2025-12-04T11:11:27.3020490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml 2025-12-04T11:11:27.3308733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml 2025-12-04T11:11:27.3601387Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml 2025-12-04T11:11:27.3924223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml 2025-12-04T11:11:27.4231702Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml 2025-12-04T11:11:27.4530622Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml 2025-12-04T11:11:27.4836938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml 2025-12-04T11:11:27.9082924Z Uploading logs for 57119749427 to S3 2025-12-04T11:11:28.0854722Z Uploading artifacts took 0.58 seconds 2025-12-04T11:11:28.0855137Z inductor/test_cuda_select_algorithm 4/5 failed! 2025-12-04T11:11:28.0859539Z Running inductor/test_deterministic 1/8 ... [2025-12-04 11:11:28.085779][7045.695685551] 2025-12-04T11:11:28.0860118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:11:28.0864878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=1', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:11:28.086241] 2025-12-04T11:11:37.8151842Z 2025-12-04T11:11:37.8152811Z inductor/test_deterministic 1/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_1.8_262bcacfdd50a1f9_.log 2025-12-04T11:11:37.8155983Z Running 3 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_float16 2025-12-04T11:11:37.8158430Z 2025-12-04T11:11:37.8158802Z Finished inductor/test_deterministic 1/8 ... [2025-12-04 11:11:37.814980][7055.424886886], took 0.16min 2025-12-04T11:11:37.8235930Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.xml 2025-12-04T11:11:37.8992281Z Running inductor/test_deterministic 6/8 ... [2025-12-04 11:11:37.898915][7055.508822753] 2025-12-04T11:11:37.8992897Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:11:37.8995813Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=6', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:11:37.899342] 2025-12-04T11:13:00.5350614Z 2025-12-04T11:13:00.5351893Z inductor/test_deterministic 6/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_6.8_b1bfd086dab71470_.log 2025-12-04T11:13:00.5354579Z Running 2 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_training_precision_bfloat16 2025-12-04T11:13:00.5356470Z 2025-12-04T11:13:00.5356878Z Finished inductor/test_deterministic 6/8 ... [2025-12-04 11:13:00.534832][7138.144741414], took 1.38min 2025-12-04T11:13:00.5435879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.xml 2025-12-04T11:13:00.6122887Z Running inductor/test_extension_backend 1/1 ... [2025-12-04 11:13:00.611950][7138.221857845] 2025-12-04T11:13:00.6123493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:00.6126192Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_extension_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:00.612374] 2025-12-04T11:13:16.2995610Z 2025-12-04T11:13:16.2996778Z inductor/test_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_extension_backend_1.1_057698d7e9793b3b_.log 2025-12-04T11:13:16.2998434Z Running 1 items in this shard: test/inductor/test_extension_backend.py::ExtensionBackendTests::test_open_device_registration 2025-12-04T11:13:16.2999145Z 2025-12-04T11:13:16.2999621Z Finished inductor/test_extension_backend 1/1 ... [2025-12-04 11:13:16.299304][7153.909213266], took 0.26min 2025-12-04T11:13:16.3080752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.xml 2025-12-04T11:13:16.3953562Z Running inductor/test_native_matmul 1/2 ... [2025-12-04 11:13:16.395011][7154.004918051] 2025-12-04T11:13:16.3954155Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:13:16.3957073Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_native_matmul.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:16.395448] 2025-12-04T11:23:38.4956219Z 2025-12-04T11:23:38.4957168Z PRINTING LOG FILE of inductor/test_native_matmul 1/2 (test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log) 2025-12-04T11:23:38.4958529Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml 2025-12-04T11:23:38.4959508Z ============================= test session starts ============================== 2025-12-04T11:23:38.4960246Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:23:38.4960967Z cachedir: .pytest_cache 2025-12-04T11:23:38.4961742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:23:38.4962643Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:23:38.4962990Z configfile: pytest.ini 2025-12-04T11:23:38.4963829Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:23:38.4964752Z collecting ... collected 8 items 2025-12-04T11:23:38.4965217Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:23:38.4968139Z Running 6 items in this shard: test/inductor/test_native_matmul.py::TestTritonDotReduction::test_3mm_add, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_1d_expand, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_2_expand, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_complex 2025-12-04T11:23:38.4970832Z 2025-12-04T11:23:38.4971279Z inductor/test_native_matmul.py::TestTritonDotReduction::test_3mm_add PASSED [119.1228s] [ 16%] 2025-12-04T11:23:38.4972228Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul PASSED [24.2448s] [ 33%] 2025-12-04T11:23:38.4973675Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:16:24.956000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.4974821Z ('RERUN', {'yellow': True}) [36.4116s] [ 50%] 2025-12-04T11:23:38.4976000Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:17:01.365000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.4977136Z ('RERUN', {'yellow': True}) [36.3815s] [ 50%] 2025-12-04T11:23:38.4978297Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:17:37.776000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.4979816Z FAILED [36.4097s] [ 50%] 2025-12-04T11:23:38.4980068Z 2025-12-04T11:23:38.4980212Z ==================================== RERUNS ==================================== 2025-12-04T11:23:38.4980837Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.4981504Z Traceback (most recent call last): 2025-12-04T11:23:38.4982170Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.4982923Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.4983692Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.4984438Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.4985081Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.4985738Z raise self.failureException(msg) 2025-12-04T11:23:38.4986106Z AssertionError: False is not true 2025-12-04T11:23:38.4986410Z 2025-12-04T11:23:38.4986625Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.4987517Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.4988220Z 2025-12-04T11:23:38.4988483Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.4989191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.4989723Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.4990105Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.4990766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.4992476Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.4993956Z graph_break [] 2025-12-04T11:23:38.4994372Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.4994960Z Traceback (most recent call last): 2025-12-04T11:23:38.4995610Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.4996286Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.4996895Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.4997557Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.4998115Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.4998772Z raise self.failureException(msg) 2025-12-04T11:23:38.4999135Z AssertionError: False is not true 2025-12-04T11:23:38.4999355Z 2025-12-04T11:23:38.4999569Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5000393Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5001210Z 2025-12-04T11:23:38.5001474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5002158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5002614Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5002987Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5003583Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5005208Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5006839Z graph_break [] 2025-12-04T11:23:38.5007216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5007678Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5008034Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5008668Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5010306Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5011766Z graph_break [] 2025-12-04T11:23:38.5012064Z =================================== FAILURES =================================== 2025-12-04T11:23:38.5012595Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5013115Z Traceback (most recent call last): 2025-12-04T11:23:38.5013782Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5014445Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5015064Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5015724Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5016305Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5016875Z raise self.failureException(msg) 2025-12-04T11:23:38.5017240Z AssertionError: False is not true 2025-12-04T11:23:38.5017464Z 2025-12-04T11:23:38.5017692Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5018508Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5019215Z 2025-12-04T11:23:38.5019481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5020101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5020565Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5020930Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5021520Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5023147Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5024572Z graph_break [] 2025-12-04T11:23:38.5024922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5025385Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5025753Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5026328Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5027943Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5029348Z graph_break [] 2025-12-04T11:23:38.5029717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5030167Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5030529Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5031116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5032812Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5034222Z graph_break [] 2025-12-04T11:23:38.5035149Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml - 2025-12-04T11:23:38.5036193Z =========================== short test summary info ============================ 2025-12-04T11:23:38.5037054Z FAILED [36.4097s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true 2025-12-04T11:23:38.5037698Z 2025-12-04T11:23:38.5037908Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5038729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5039352Z 2025-12-04T11:23:38.5039614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5040192Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:23:38.5040689Z =============== 1 failed, 2 passed, 2 rerun in 252.60s (0:04:12) =============== 2025-12-04T11:23:38.5041123Z Got exit code 1 2025-12-04T11:23:38.5041384Z Retrying single test... 2025-12-04T11:23:38.5042187Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml 2025-12-04T11:23:38.5043064Z ============================= test session starts ============================== 2025-12-04T11:23:38.5043771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:23:38.5044367Z cachedir: .pytest_cache 2025-12-04T11:23:38.5045055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:23:38.5045918Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:23:38.5046269Z configfile: pytest.ini 2025-12-04T11:23:38.5047023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:23:38.5047924Z collecting ... collected 8 items / 5 deselected / 3 selected 2025-12-04T11:23:38.5048817Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 2025-12-04T11:23:38.5049624Z Running 1 items in this shard 2025-12-04T11:23:38.5049832Z 2025-12-04T11:23:38.5050716Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:18:29.785385087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5051713Z 2025-12-04T11:23:38.5052149Z E1204 11:18:45.387000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5052858Z ('RERUN', {'yellow': True}) [56.7083s] [100%] 2025-12-04T11:23:38.5053984Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:19:21.109170003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5054979Z 2025-12-04T11:23:38.5055424Z E1204 11:19:21.670000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5056104Z ('RERUN', {'yellow': True}) [36.1587s] [100%] 2025-12-04T11:23:38.5057227Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:19:57.218156732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5058228Z 2025-12-04T11:23:38.5058661Z E1204 11:19:57.776000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5059415Z FAILED [36.1040s] [100%] 2025-12-04T11:23:38.5059599Z 2025-12-04T11:23:38.5059737Z ==================================== RERUNS ==================================== 2025-12-04T11:23:38.5060283Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5061682Z Traceback (most recent call last): 2025-12-04T11:23:38.5062344Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5063004Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5063651Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5064311Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5064871Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5065448Z raise self.failureException(msg) 2025-12-04T11:23:38.5065815Z AssertionError: False is not true 2025-12-04T11:23:38.5066035Z 2025-12-04T11:23:38.5066262Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5067070Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5067692Z 2025-12-04T11:23:38.5067951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5068574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5069024Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5069395Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5070756Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5072275Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5072811Z graph_break [] 2025-12-04T11:23:38.5073185Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5074789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5076440Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5077100Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5077613Z Traceback (most recent call last): 2025-12-04T11:23:38.5078266Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5078937Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5079535Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5080190Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5080756Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5081318Z raise self.failureException(msg) 2025-12-04T11:23:38.5081678Z AssertionError: False is not true 2025-12-04T11:23:38.5081896Z 2025-12-04T11:23:38.5082193Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5083027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5083641Z 2025-12-04T11:23:38.5083904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5084530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5084994Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5085352Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5086803Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5088354Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5088897Z graph_break [] 2025-12-04T11:23:38.5089254Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5090892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5092533Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5093164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5093619Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5093989Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5094578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5096207Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5097716Z graph_break [] 2025-12-04T11:23:38.5098016Z =================================== FAILURES =================================== 2025-12-04T11:23:38.5098560Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5099070Z Traceback (most recent call last): 2025-12-04T11:23:38.5099717Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5100388Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5101149Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5101802Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5102373Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5102953Z raise self.failureException(msg) 2025-12-04T11:23:38.5103320Z AssertionError: False is not true 2025-12-04T11:23:38.5103541Z 2025-12-04T11:23:38.5103752Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5104574Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5105184Z 2025-12-04T11:23:38.5105459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5106074Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5106541Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5106917Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5108280Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5109775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5110310Z graph_break [] 2025-12-04T11:23:38.5110677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5112405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5114038Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5114665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5115171Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5115543Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5116122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5117792Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5119201Z graph_break [] 2025-12-04T11:23:38.5119570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5120025Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5120403Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5120992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5122669Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5124081Z graph_break [] 2025-12-04T11:23:38.5124983Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml - 2025-12-04T11:23:38.5126027Z =========================== short test summary info ============================ 2025-12-04T11:23:38.5126854Z FAILED [36.1040s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true 2025-12-04T11:23:38.5127510Z 2025-12-04T11:23:38.5127718Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5128538Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5129149Z 2025-12-04T11:23:38.5129422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5129985Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:23:38.5130511Z ============= 1 failed, 5 deselected, 2 rerun in 129.00s (0:02:09) ============= 2025-12-04T11:23:38.5130954Z Got exit code 1 2025-12-04T11:23:38.5131215Z Retrying single test... 2025-12-04T11:23:38.5131960Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml 2025-12-04T11:23:38.5132820Z ============================= test session starts ============================== 2025-12-04T11:23:38.5133461Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:23:38.5134032Z cachedir: .pytest_cache 2025-12-04T11:23:38.5134728Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:23:38.5135493Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:23:38.5135834Z configfile: pytest.ini 2025-12-04T11:23:38.5136577Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:23:38.5137560Z collecting ... collected 8 items / 5 deselected / 3 selected 2025-12-04T11:23:38.5138441Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 2025-12-04T11:23:38.5139240Z Running 1 items in this shard 2025-12-04T11:23:38.5139445Z 2025-12-04T11:23:38.5140410Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:20:48.605987339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5141440Z 2025-12-04T11:23:38.5141894Z E1204 11:21:05.123000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5142588Z ('RERUN', {'yellow': True}) [56.5570s] [100%] 2025-12-04T11:23:38.5143715Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:21:41.854814777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5144751Z 2025-12-04T11:23:38.5145186Z E1204 11:21:41.416000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5145888Z ('RERUN', {'yellow': True}) [36.1641s] [100%] 2025-12-04T11:23:38.5147001Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:22:17.014751479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:23:38.5148004Z 2025-12-04T11:23:38.5148437Z E1204 11:22:17.572000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001 2025-12-04T11:23:38.5149107Z FAILED [36.1550s] [100%] 2025-12-04T11:23:38.5149287Z 2025-12-04T11:23:38.5149438Z ==================================== RERUNS ==================================== 2025-12-04T11:23:38.5149966Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5150486Z Traceback (most recent call last): 2025-12-04T11:23:38.5151153Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5151828Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5152431Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5153089Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5153662Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5154229Z raise self.failureException(msg) 2025-12-04T11:23:38.5154592Z AssertionError: False is not true 2025-12-04T11:23:38.5154813Z 2025-12-04T11:23:38.5155056Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5155879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5156497Z 2025-12-04T11:23:38.5156757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5157378Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5157846Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5158201Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5159569Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5161088Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5161641Z graph_break [] 2025-12-04T11:23:38.5161997Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5163671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5165323Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5166116Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5166622Z Traceback (most recent call last): 2025-12-04T11:23:38.5167286Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5168003Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5168619Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5169273Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5169881Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5170464Z raise self.failureException(msg) 2025-12-04T11:23:38.5170816Z AssertionError: False is not true 2025-12-04T11:23:38.5171048Z 2025-12-04T11:23:38.5171256Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5172071Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5172682Z 2025-12-04T11:23:38.5172954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5173553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5174022Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5174389Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5175747Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5177246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5177790Z graph_break [] 2025-12-04T11:23:38.5178155Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5179760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5181398Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5182025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5182489Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5182846Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5183435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5185067Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5186484Z graph_break [] 2025-12-04T11:23:38.5186768Z =================================== FAILURES =================================== 2025-12-04T11:23:38.5187311Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________ 2025-12-04T11:23:38.5187831Z Traceback (most recent call last): 2025-12-04T11:23:38.5188489Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16 2025-12-04T11:23:38.5189150Z self._check_equal(f, (x, y)) 2025-12-04T11:23:38.5189757Z File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal 2025-12-04T11:23:38.5190412Z self.assertTrue(same(expect, actual)) 2025-12-04T11:23:38.5190970Z File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue 2025-12-04T11:23:38.5191546Z raise self.failureException(msg) 2025-12-04T11:23:38.5191907Z AssertionError: False is not true 2025-12-04T11:23:38.5192131Z 2025-12-04T11:23:38.5192439Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5193256Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5193911Z 2025-12-04T11:23:38.5194173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5194794Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5195246Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5195645Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5197004Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5198714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5199250Z graph_break [] 2025-12-04T11:23:38.5199620Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:23:38.5201391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.) 2025-12-04T11:23:38.5203108Z (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel( 2025-12-04T11:23:38.5203727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5204195Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5204568Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5205158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5206775Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5208189Z graph_break [] 2025-12-04T11:23:38.5208553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:23:38.5209014Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:23:38.5209365Z stats [('calls_captured', 2), ('unique_graphs', 1)] 2025-12-04T11:23:38.5209955Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:23:38.5211570Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:23:38.5212970Z graph_break [] 2025-12-04T11:23:38.5213855Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml - 2025-12-04T11:23:38.5214892Z =========================== short test summary info ============================ 2025-12-04T11:23:38.5215728Z FAILED [36.1550s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true 2025-12-04T11:23:38.5216369Z 2025-12-04T11:23:38.5216593Z To execute this test, run the following from the base repo dir: 2025-12-04T11:23:38.5217405Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16 2025-12-04T11:23:38.5218025Z 2025-12-04T11:23:38.5218288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:23:38.5218867Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:23:38.5219524Z ============= 1 failed, 5 deselected, 2 rerun in 128.90s (0:02:08) ============= 2025-12-04T11:23:38.5219977Z Got exit code 1 2025-12-04T11:23:38.5220532Z FAILED CONSISTENTLY: test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 2025-12-04T11:23:38.5221511Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:23:38.5222603Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml 2025-12-04T11:23:38.5223511Z ============================= test session starts ============================== 2025-12-04T11:23:38.5224154Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:23:38.5224734Z cachedir: .pytest_cache 2025-12-04T11:23:38.5225415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:23:38.5226182Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:23:38.5226529Z configfile: pytest.ini 2025-12-04T11:23:38.5227270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:23:38.5228188Z collecting ... collected 8 items / 3 deselected / 5 selected 2025-12-04T11:23:38.5228663Z stepcurrent: skipping 3 already run items. 2025-12-04T11:23:38.5229039Z Running 3 items in this shard 2025-12-04T11:23:38.5229242Z 2025-12-04T11:23:38.5229639Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_1d_expand PASSED [27.8324s] [ 33%] 2025-12-04T11:23:38.5230557Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_2_expand PASSED [13.6980s] [ 66%] 2025-12-04T11:23:38.5231465Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_complex PASSED [26.4024s] [100%] 2025-12-04T11:23:38.5231978Z 2025-12-04T11:23:38.5232733Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml - 2025-12-04T11:23:38.5233792Z ================== 3 passed, 3 deselected in 67.96s (0:01:07) ================== 2025-12-04T11:23:38.5234636Z The following tests failed consistently: ['test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16'] 2025-12-04T11:23:38.5235275Z 2025-12-04T11:23:38.5235841Z FINISHED PRINTING LOG FILE of inductor/test_native_matmul 1/2 (test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log) 2025-12-04T11:23:38.5236525Z 2025-12-04T11:23:38.5236892Z Finished inductor/test_native_matmul 1/2 ... [2025-12-04 11:23:38.495378][7776.105283319], took 10.37min 2025-12-04T11:23:38.5238154Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml 2025-12-04T11:23:38.5798255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml 2025-12-04T11:23:38.6118434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml 2025-12-04T11:23:38.6419370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml 2025-12-04T11:23:39.0400394Z Uploading logs for 57119749427 to S3 2025-12-04T11:23:39.1048343Z Uploading artifacts took 0.42 seconds 2025-12-04T11:23:39.1048727Z inductor/test_native_matmul 1/2 failed! 2025-12-04T11:23:39.1053412Z Running dynamo/test_fx_graph_runnable 1/1 ... [2025-12-04 11:23:39.105178][7776.715086643] 2025-12-04T11:23:39.1054077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:23:39.1058946Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_graph_runnable.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:23:39.105615] 2025-12-04T11:26:34.4791980Z 2025-12-04T11:26:34.4793148Z dynamo/test_fx_graph_runnable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_graph_runnable_1.1_bc88b60e43fe7f12_.log 2025-12-04T11:26:34.4802797Z Running 17 items in this shard: test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_gather_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_reduce_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_basic_tensor_add, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_add_dynamic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dtensor_compile_redistribute, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dynamic_expression, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dynamic_shapes_run, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_metrics_context, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_reduce_scatter_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_scalar_multiply, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_basic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_batch_processing, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_dynamic_batch, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_two_inputs_matmul, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel_autotune 2025-12-04T11:26:34.4811299Z 2025-12-04T11:26:34.4811672Z Finished dynamo/test_fx_graph_runnable 1/1 ... [2025-12-04 11:26:34.478961][7952.088870292], took 2.92min 2025-12-04T11:26:34.4885625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.xml 2025-12-04T11:26:34.7188956Z Running inductor/test_memory 1/1 ... [2025-12-04 11:26:34.718580][7952.328487815] 2025-12-04T11:26:34.7189499Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:26:34.7192502Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:26:34.719015] 2025-12-04T11:27:59.7004184Z 2025-12-04T11:27:59.7005219Z PRINTING LOG FILE of inductor/test_memory 1/1 (test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log) 2025-12-04T11:27:59.7006522Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml 2025-12-04T11:27:59.7007771Z ============================= test session starts ============================== 2025-12-04T11:27:59.7008898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:27:59.7009796Z cachedir: .pytest_cache 2025-12-04T11:27:59.7010767Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:27:59.7011918Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:27:59.7012446Z configfile: pytest.ini 2025-12-04T11:27:59.7013731Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:27:59.7015126Z collecting ... collected 8 items 2025-12-04T11:27:59.7015696Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:27:59.7021536Z Running 8 items in this shard: test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusing_reductions_increase_peak_memory, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusion_acc_large_reads, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_multiple_mutations_of_buf, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_bfs, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_dfs, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_lpmf 2025-12-04T11:27:59.7025859Z 2025-12-04T11:27:59.7026865Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusing_reductions_increase_peak_memory W1204 11:26:47.605000 100522 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:27:59.7028104Z PASSED [5.4475s] [ 12%] 2025-12-04T11:27:59.7028757Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusion_acc_large_reads PASSED [1.5106s] [ 25%] 2025-12-04T11:27:59.7029834Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_multiple_mutations_of_buf PASSED [0.6490s] [ 37%] 2025-12-04T11:27:59.7031025Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.6213s] [ 50%] 2025-12-04T11:27:59.7032297Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5562s] [ 50%] 2025-12-04T11:27:59.7033494Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.7637s] [ 50%] 2025-12-04T11:27:59.7034104Z 2025-12-04T11:27:59.7034256Z ==================================== RERUNS ==================================== 2025-12-04T11:27:59.7034868Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.7035434Z Traceback (most recent call last): 2025-12-04T11:27:59.7036141Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.7036921Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.7038524Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7040177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7040647Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7040994Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7041410Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7042209Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7072598Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7103121Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7121226Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7140585Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7142307Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7143386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7144324Z warnings.warn( 2025-12-04T11:27:59.7145197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7146144Z warnings.warn( 2025-12-04T11:27:59.7147012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7147939Z warnings.warn( 2025-12-04T11:27:59.7149466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7151479Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7153253Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7154752Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7156610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7158678Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7160478Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7162005Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7162776Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.7163357Z Traceback (most recent call last): 2025-12-04T11:27:59.7164065Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.7164844Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.7166760Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7168430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7168907Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7169259Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7169829Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7170525Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7201170Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7232381Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7250422Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7270365Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7272175Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7273262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7274216Z warnings.warn( 2025-12-04T11:27:59.7275080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7276021Z warnings.warn( 2025-12-04T11:27:59.7277074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7278034Z warnings.warn( 2025-12-04T11:27:59.7279561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7281628Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7283491Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7285063Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7286929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7288999Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7290760Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7292250Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7292844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7293315Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7293642Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7294072Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7294758Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7325518Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7355568Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7373457Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7392678Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7394396Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7395479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7396425Z warnings.warn( 2025-12-04T11:27:59.7397292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7398242Z warnings.warn( 2025-12-04T11:27:59.7399113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7400041Z warnings.warn( 2025-12-04T11:27:59.7401828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7403911Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7405731Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7407272Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7409069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7411133Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7412895Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7414383Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7414895Z =================================== FAILURES =================================== 2025-12-04T11:27:59.7415504Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.7416079Z Traceback (most recent call last): 2025-12-04T11:27:59.7416776Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.7417547Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.7419164Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7420812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7421282Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7421618Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7422048Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7422735Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7453165Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7483411Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7501520Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7520842Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7522656Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7523741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7524683Z warnings.warn( 2025-12-04T11:27:59.7525563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7526510Z warnings.warn( 2025-12-04T11:27:59.7527364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7528314Z warnings.warn( 2025-12-04T11:27:59.7529851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7531861Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7533638Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7535124Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7536932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7539087Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7540857Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7542390Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7543003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7543473Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7543823Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7544239Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7544932Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7579488Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7610113Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7628077Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7647365Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7649140Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7650222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7651222Z warnings.warn( 2025-12-04T11:27:59.7652082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7653057Z warnings.warn( 2025-12-04T11:27:59.7653918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7654848Z warnings.warn( 2025-12-04T11:27:59.7656376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7658376Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7660152Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7661646Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7663444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7665500Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7667261Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7668756Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7669350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7669802Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.7670148Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7670578Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7671261Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7701951Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7732199Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7750025Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7769343Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7771061Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7772123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7773077Z warnings.warn( 2025-12-04T11:27:59.7773951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7774895Z warnings.warn( 2025-12-04T11:27:59.7775744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7776681Z warnings.warn( 2025-12-04T11:27:59.7778209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7780217Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7782030Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7783531Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7785383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7787439Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7789231Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7790743Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7791781Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml - 2025-12-04T11:27:59.7792746Z =========================== short test summary info ============================ 2025-12-04T11:27:59.7794938Z FAILED [1.7637s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7797086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:27:59.7797581Z ==================== 1 failed, 3 passed, 2 rerun in 12.58s ===================== 2025-12-04T11:27:59.7798008Z Got exit code 1 2025-12-04T11:27:59.7798271Z Retrying single test... 2025-12-04T11:27:59.7798931Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml 2025-12-04T11:27:59.7799720Z ============================= test session starts ============================== 2025-12-04T11:27:59.7800369Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:27:59.7801105Z cachedir: .pytest_cache 2025-12-04T11:27:59.7801784Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:27:59.7802621Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:27:59.7802972Z configfile: pytest.ini 2025-12-04T11:27:59.7803719Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:27:59.7804642Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T11:27:59.7817317Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation 2025-12-04T11:27:59.7818379Z Running 1 items in this shard 2025-12-04T11:27:59.7818593Z 2025-12-04T11:27:59.7819572Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation W1204 11:27:11.801000 100899 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:27:59.7820747Z ('RERUN', {'yellow': True}) [7.0158s] [100%] 2025-12-04T11:27:59.7821594Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5736s] [100%] 2025-12-04T11:27:59.7822792Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.5633s] [100%] 2025-12-04T11:27:59.7823582Z 2025-12-04T11:27:59.7823738Z ==================================== RERUNS ==================================== 2025-12-04T11:27:59.7824335Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.7824917Z Traceback (most recent call last): 2025-12-04T11:27:59.7825614Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.7826446Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.7828052Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7829772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7830243Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.7830643Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7831188Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7831884Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7862271Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.7892407Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.7910506Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.7929888Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.7931623Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.7932743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7933700Z warnings.warn( 2025-12-04T11:27:59.7934563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7935564Z warnings.warn( 2025-12-04T11:27:59.7936437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.7937410Z warnings.warn( 2025-12-04T11:27:59.7938919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7940917Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7942689Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7944184Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7945989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.7948036Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.7949809Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.7951308Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.7952008Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.7952578Z Traceback (most recent call last): 2025-12-04T11:27:59.7953275Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.7954054Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.7955674Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.7957303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.7957773Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.7958117Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.7958666Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.7959390Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.7989740Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8020100Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8038060Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8057454Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8059171Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8060251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8061208Z warnings.warn( 2025-12-04T11:27:59.8062082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8063017Z warnings.warn( 2025-12-04T11:27:59.8063882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8064820Z warnings.warn( 2025-12-04T11:27:59.8066392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8068382Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8070197Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8071718Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8073524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8075608Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8077374Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8078847Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8079444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8079911Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.8080243Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8080676Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8081371Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8112084Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8142167Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8160021Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8179245Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8181009Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8182089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8183045Z warnings.warn( 2025-12-04T11:27:59.8183902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8184851Z warnings.warn( 2025-12-04T11:27:59.8185710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8186647Z warnings.warn( 2025-12-04T11:27:59.8188167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8190170Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8191940Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8193447Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8195235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8197297Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8199065Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8200554Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8201271Z =================================== FAILURES =================================== 2025-12-04T11:27:59.8201871Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.8202676Z Traceback (most recent call last): 2025-12-04T11:27:59.8203382Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.8204163Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.8205822Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.8207465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8207979Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.8208327Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8208867Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8209607Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8239911Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8270146Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8288493Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8307956Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8309682Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8310766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8311728Z warnings.warn( 2025-12-04T11:27:59.8312626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8313569Z warnings.warn( 2025-12-04T11:27:59.8314428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8315409Z warnings.warn( 2025-12-04T11:27:59.8316915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8318977Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8320749Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8322319Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8324120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8326166Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8327944Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8329441Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8330032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8330490Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.8330836Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8331267Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8331962Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8362293Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8392528Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8410621Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8429897Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8431624Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8432701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8433641Z warnings.warn( 2025-12-04T11:27:59.8434509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8435457Z warnings.warn( 2025-12-04T11:27:59.8436310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8437253Z warnings.warn( 2025-12-04T11:27:59.8438785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8440811Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8442642Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8444129Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8445978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8448045Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8449845Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8451344Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8451972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8452447Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.8452795Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8453244Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8453941Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8484314Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8514658Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8532697Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8552378Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8554100Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8555218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8556175Z warnings.warn( 2025-12-04T11:27:59.8557049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8558011Z warnings.warn( 2025-12-04T11:27:59.8558874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8559850Z warnings.warn( 2025-12-04T11:27:59.8561380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8563483Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8565264Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8566765Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8568564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8570632Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8572385Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8573883Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8574925Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml - 2025-12-04T11:27:59.8575901Z =========================== short test summary info ============================ 2025-12-04T11:27:59.8578098Z FAILED [1.5633s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.8580235Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:27:59.8580745Z ================== 1 failed, 7 deselected, 2 rerun in 10.18s =================== 2025-12-04T11:27:59.8581180Z Got exit code 1 2025-12-04T11:27:59.8581448Z Retrying single test... 2025-12-04T11:27:59.8582173Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml 2025-12-04T11:27:59.8582972Z ============================= test session starts ============================== 2025-12-04T11:27:59.8583623Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:27:59.8584198Z cachedir: .pytest_cache 2025-12-04T11:27:59.8584894Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:27:59.8585716Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:27:59.8586063Z configfile: pytest.ini 2025-12-04T11:27:59.8586807Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:27:59.8587753Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T11:27:59.8588753Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation 2025-12-04T11:27:59.8589683Z Running 1 items in this shard 2025-12-04T11:27:59.8589891Z 2025-12-04T11:27:59.8590825Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation W1204 11:27:32.824000 101187 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:27:59.8592017Z ('RERUN', {'yellow': True}) [7.0124s] [100%] 2025-12-04T11:27:59.8592857Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5637s] [100%] 2025-12-04T11:27:59.8594068Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.5494s] [100%] 2025-12-04T11:27:59.8594685Z 2025-12-04T11:27:59.8594826Z ==================================== RERUNS ==================================== 2025-12-04T11:27:59.8595431Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.8596013Z Traceback (most recent call last): 2025-12-04T11:27:59.8596701Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.8597476Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.8599098Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.8600742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8601416Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.8601770Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8602401Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8603105Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8633506Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8663830Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8681691Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8701239Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8702969Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8704060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8705003Z warnings.warn( 2025-12-04T11:27:59.8705872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8707004Z warnings.warn( 2025-12-04T11:27:59.8707861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8708797Z warnings.warn( 2025-12-04T11:27:59.8710324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8712338Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8714105Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8715596Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8717521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8719586Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8721396Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8722993Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8723679Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.8724312Z Traceback (most recent call last): 2025-12-04T11:27:59.8725009Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.8725790Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.8727435Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.8729091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8729560Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.8729905Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8730444Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8731142Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8761451Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8791595Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8809769Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8829152Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8830914Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8831995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8832999Z warnings.warn( 2025-12-04T11:27:59.8833860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8834808Z warnings.warn( 2025-12-04T11:27:59.8835672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8836612Z warnings.warn( 2025-12-04T11:27:59.8838119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8840126Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8841899Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8843498Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8845293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8847344Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8849117Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8850611Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8851208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8851663Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.8852009Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8852436Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.8853127Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8883468Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.8913704Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.8931881Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.8951455Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.8953195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.8954267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8955221Z warnings.warn( 2025-12-04T11:27:59.8956091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8957035Z warnings.warn( 2025-12-04T11:27:59.8957898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.8958836Z warnings.warn( 2025-12-04T11:27:59.8960424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8962519Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8964343Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8965829Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8967628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.8969766Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.8971538Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.8973032Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.8973543Z =================================== FAILURES =================================== 2025-12-04T11:27:59.8974155Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________ 2025-12-04T11:27:59.8974852Z Traceback (most recent call last): 2025-12-04T11:27:59.8975537Z File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation 2025-12-04T11:27:59.8976316Z self.assertEqual(buffer_info[pre][0:2], (2048, 2048)) 2025-12-04T11:27:59.8977934Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.8979587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.8980042Z frames [('total', 505), ('ok', 489)] 2025-12-04T11:27:59.8980389Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.8980940Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.8981635Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.9012129Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.9042768Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.9060949Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.9080659Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.9082506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.9083602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9084554Z warnings.warn( 2025-12-04T11:27:59.9085438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9086399Z warnings.warn( 2025-12-04T11:27:59.9087271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9088213Z warnings.warn( 2025-12-04T11:27:59.9089743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9091768Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9093578Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9095110Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9096914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9099061Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9100977Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9102489Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9103177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.9103656Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.9104007Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.9104439Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.9105165Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.9135756Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.9165949Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.9183821Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.9203284Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.9205012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.9206163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9207119Z warnings.warn( 2025-12-04T11:27:59.9208000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9208932Z warnings.warn( 2025-12-04T11:27:59.9209829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9210770Z warnings.warn( 2025-12-04T11:27:59.9212328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9214374Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9216138Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9217635Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9219442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9221515Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9223283Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9224777Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9225370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:27:59.9225837Z frames [('total', 464), ('ok', 448)] 2025-12-04T11:27:59.9226169Z stats [('calls_captured', 30)] 2025-12-04T11:27:59.9226600Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)] 2025-12-04T11:27:59.9227292Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)] 2025-12-04T11:27:59.9257740Z graph_break [("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n Hint: Avoid calling the function `CUDABackend.parse_options`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n Explanation: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n Developer debug context: module: , qualname: pybind11_object.__new__, skip reason: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n Hint: Avoid calling `builder.set_loc` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n Hint: Avoid calling `builder.create_module` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n Hint: Avoid calling the function `CudaLauncher.__init__`.\n Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n Hint: Please file an issue to PyTorch.\n\n Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)] 2025-12-04T11:27:59.9288441Z aten_mm_info [('aten.mm_32_32_32', 4)] 2025-12-04T11:27:59.9306530Z unimplemented [('Attempt to trace generator\n Explanation: Generators cannot be compiled directly with `torch.compile`.\n Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n Hint: Use `torch.cond` to express dynamic control flow.\n\n Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n Hint: Please report an issue to PyTorch.\n\n Developer debug context: call_method UserDefinedClassVariable() __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n Hint: Avoid calling `list_iterator.__next__` in your code.\n Hint: Please report an issue to PyTorch.\n Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)] 2025-12-04T11:27:59.9325992Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)] 2025-12-04T11:27:59.9327731Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:27:59.9328806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9329759Z warnings.warn( 2025-12-04T11:27:59.9330623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9331567Z warnings.warn( 2025-12-04T11:27:59.9332429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T11:27:59.9333375Z warnings.warn( 2025-12-04T11:27:59.9334899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9336906Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9338682Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9340189Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9342050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind). 2025-12-04T11:27:59.9344119Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround. 2025-12-04T11:27:59.9345891Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`. 2025-12-04T11:27:59.9347428Z torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints)) 2025-12-04T11:27:59.9348468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml - 2025-12-04T11:27:59.9349455Z =========================== short test summary info ============================ 2025-12-04T11:27:59.9351643Z FAILED [1.5494s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0' 2025-12-04T11:27:59.9353848Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:27:59.9354360Z ================== 1 failed, 7 deselected, 2 rerun in 10.15s =================== 2025-12-04T11:27:59.9354795Z Got exit code 1 2025-12-04T11:27:59.9355448Z FAILED CONSISTENTLY: test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation 2025-12-04T11:27:59.9356491Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:27:59.9357525Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml 2025-12-04T11:27:59.9358305Z ============================= test session starts ============================== 2025-12-04T11:27:59.9358955Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:27:59.9359547Z cachedir: .pytest_cache 2025-12-04T11:27:59.9360243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:27:59.9360999Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:27:59.9361349Z configfile: pytest.ini 2025-12-04T11:27:59.9362191Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:27:59.9363115Z collecting ... collected 8 items / 4 deselected / 4 selected 2025-12-04T11:27:59.9363578Z stepcurrent: skipping 4 already run items. 2025-12-04T11:27:59.9363956Z Running 4 items in this shard 2025-12-04T11:27:59.9364162Z 2025-12-04T11:27:59.9365085Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory W1204 11:27:54.628000 101475 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:27:59.9366206Z PASSED [6.2993s] [ 25%] 2025-12-04T11:27:59.9366858Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_bfs PASSED [0.7725s] [ 50%] 2025-12-04T11:27:59.9367936Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_dfs PASSED [0.7749s] [ 75%] 2025-12-04T11:27:59.9369014Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_lpmf PASSED [0.7747s] [100%] 2025-12-04T11:27:59.9369619Z 2025-12-04T11:27:59.9370282Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml - 2025-12-04T11:27:59.9371265Z ======================= 4 passed, 4 deselected in 8.65s ======================== 2025-12-04T11:27:59.9372256Z The following tests failed consistently: ['test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation'] 2025-12-04T11:27:59.9373006Z 2025-12-04T11:27:59.9373499Z FINISHED PRINTING LOG FILE of inductor/test_memory 1/1 (test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log) 2025-12-04T11:27:59.9374101Z 2025-12-04T11:27:59.9374428Z Finished inductor/test_memory 1/1 ... [2025-12-04 11:27:59.702035][8037.311935471], took 1.42min 2025-12-04T11:27:59.9375617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml 2025-12-04T11:27:59.9377232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml 2025-12-04T11:27:59.9378829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml 2025-12-04T11:27:59.9380411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml 2025-12-04T11:28:00.2543104Z Uploading logs for 57119749427 to S3 2025-12-04T11:28:00.3313217Z Uploading artifacts took 0.37 seconds 2025-12-04T11:28:00.3313614Z inductor/test_memory 1/1 failed! 2025-12-04T11:28:00.3317898Z Running dynamo/test_streams 1/1 ... [2025-12-04 11:28:00.331613][8037.941519721] 2025-12-04T11:28:00.3318470Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:28:00.3323409Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_streams.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:00.332078] 2025-12-04T11:28:18.2227873Z 2025-12-04T11:28:18.2228785Z dynamo/test_streams 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_streams_1.1_834a989fad2ef2e3_.log 2025-12-04T11:28:18.2239113Z Running 28 items in this shard: test/dynamo/test_streams.py::TestStreams::test_current_stream_api, test/dynamo/test_streams.py::TestStreams::test_event_tracing, test/dynamo/test_streams.py::TestStreams::test_event_weakref, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return_different_device, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return_no_index, test/dynamo/test_streams.py::TestStreams::test_inductor_lowering, test/dynamo/test_streams.py::TestStreams::test_is_marked_side_effectful, test/dynamo/test_streams.py::TestStreams::test_local_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_local_stream_nested_enter_exit, test/dynamo/test_streams.py::TestStreams::test_local_stream_return, test/dynamo/test_streams.py::TestStreams::test_nested_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_nested_stream_enter_exit_graph_break, test/dynamo/test_streams.py::TestStreams::test_new_event_api, test/dynamo/test_streams.py::TestStreams::test_new_stream_api, test/dynamo/test_streams.py::TestStreams::test_record_stream_problem_basic, test/dynamo/test_streams.py::TestStreams::test_record_stream_problem_interleaved, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_fork_join, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_wait_record, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_wait_record_stream, test/dynamo/test_streams.py::TestStreams::test_stream_backward_simple, test/dynamo/test_streams.py::TestStreams::test_stream_backward_sync, test/dynamo/test_streams.py::TestStreams::test_stream_context_graph_break, test/dynamo/test_streams.py::TestStreams::test_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_stream_enter_exit_graph_break, test/dynamo/test_streams.py::TestStreams::test_stream_input, test/dynamo/test_streams.py::TestStreams::test_stream_weakref, test/dynamo/test_streams.py::TestStreams::test_stream_with_mutation 2025-12-04T11:28:18.2248896Z 2025-12-04T11:28:18.2249226Z Finished dynamo/test_streams 1/1 ... [2025-12-04 11:28:18.222576][8055.832485802], took 0.30min 2025-12-04T11:28:18.2326619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.xml 2025-12-04T11:28:18.3227569Z Running inductor/test_unbacked_symints 1/1 ... [2025-12-04 11:28:18.322409][8055.932315723] 2025-12-04T11:28:18.3228203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:28:18.3230717Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_unbacked_symints.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:18.322816] 2025-12-04T11:31:57.3170389Z 2025-12-04T11:31:57.3171611Z PRINTING LOG FILE of inductor/test_unbacked_symints 1/1 (test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log) 2025-12-04T11:31:57.3173927Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml 2025-12-04T11:31:57.3175262Z ============================= test session starts ============================== 2025-12-04T11:31:57.3176251Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.3177099Z cachedir: .pytest_cache 2025-12-04T11:31:57.3178023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.3179036Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.3179437Z configfile: pytest.ini 2025-12-04T11:31:57.3180256Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.3181290Z collecting ... collected 32 items 2025-12-04T11:31:57.3181833Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:31:57.3208509Z Running 32 items in this shard: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotune_with_unbacked_stride_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotuning_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_broadcast_tensors_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_combo_kernel_size_hint_failure_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_einsum_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_equivalent_backed_unbacked_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_ok_with_runtime_assert_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_issue_143498_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_nonzero_in_inference_mode_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_softmax_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_split_with_sizes_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_to_int_with_unbacked_size_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_grid_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_with_unbacked_symint_fallback_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_linear_layer_norm_input_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_masked_scatter_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_range_tree_divisor_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_repeat_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic2_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_vertical_pointwise_reduction_fusion_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_view_of_slice_cuda 2025-12-04T11:31:57.3231535Z 2025-12-04T11:31:57.3232526Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotune_with_unbacked_stride_cuda W1204 11:28:31.577000 102009 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:31:57.3233721Z PASSED [3.7078s] [ 3%] 2025-12-04T11:31:57.3234327Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotuning_cuda PASSED [0.7196s] [ 6%] 2025-12-04T11:31:57.3235352Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_broadcast_tensors_cuda PASSED [1.0656s] [ 9%] 2025-12-04T11:31:57.3236463Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_combo_kernel_size_hint_failure_cuda PASSED [0.9586s] [ 12%] 2025-12-04T11:31:57.3237530Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_einsum_cuda PASSED [1.3248s] [ 15%] 2025-12-04T11:31:57.3238556Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_equivalent_backed_unbacked_cuda PASSED [0.8151s] [ 18%] 2025-12-04T11:31:57.3239594Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_cuda PASSED [0.2129s] [ 21%] 2025-12-04T11:31:57.3240647Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_ok_with_runtime_assert_cuda PASSED [0.1673s] [ 25%] 2025-12-04T11:31:57.3241724Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_issue_143498_cuda PASSED [0.8675s] [ 28%] 2025-12-04T11:31:57.3242880Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_False_cuda PASSED [0.2551s] [ 31%] 2025-12-04T11:31:57.3244008Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_True_cuda PASSED [0.2240s] [ 34%] 2025-12-04T11:31:57.3245119Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_False_cuda PASSED [0.2654s] [ 37%] 2025-12-04T11:31:57.3246215Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_True_cuda PASSED [0.2278s] [ 40%] 2025-12-04T11:31:57.3247290Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_False_cuda PASSED [0.1822s] [ 43%] 2025-12-04T11:31:57.3248387Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_True_cuda PASSED [0.1832s] [ 46%] 2025-12-04T11:31:57.3249490Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_nonzero_in_inference_mode_cuda PASSED [0.1517s] [ 50%] 2025-12-04T11:31:57.3250698Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda ('RERUN', {'yellow': True}) [1.5862s] [ 53%] 2025-12-04T11:31:57.3252043Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda ('RERUN', {'yellow': True}) [1.4845s] [ 53%] 2025-12-04T11:31:57.3253223Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda FAILED [1.4461s] [ 53%] 2025-12-04T11:31:57.3253848Z 2025-12-04T11:31:57.3253991Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.3254571Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3255149Z Traceback (most recent call last): 2025-12-04T11:31:57.3255895Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3256680Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3257478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3258195Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3258782Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3259428Z def fn(x, y): 2025-12-04T11:31:57.3259997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3260672Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3261348Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3262066Z return compiled_fn(full_args) 2025-12-04T11:31:57.3262873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3263742Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3264614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3265467Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3266253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3267078Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3267881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3268671Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3269459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3270251Z outs = compiled_fn(args) 2025-12-04T11:31:57.3270907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3271612Z return self.current_callable(inputs) 2025-12-04T11:31:57.3272258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3272909Z out = model(new_inputs) 2025-12-04T11:31:57.3273550Z File "/tmp/tmpzk88d_6q/iq/ciqi2pkzc6ppzct2bxn5qysanloemqavdl46uw4qpca7rbcygols.py", line 232, in call 2025-12-04T11:31:57.3274502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3275113Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3275584Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3275948Z 2025-12-04T11:31:57.3276160Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3277088Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3277817Z 2025-12-04T11:31:57.3278079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3278706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3279256Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3279855Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3281809Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3283586Z graph_break [] 2025-12-04T11:31:57.3284019Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3284611Z Traceback (most recent call last): 2025-12-04T11:31:57.3285347Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3286122Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3286888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3287607Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3288183Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3288787Z def fn(x, y): 2025-12-04T11:31:57.3289351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3290020Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3290696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3291398Z return compiled_fn(full_args) 2025-12-04T11:31:57.3292212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3293079Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3293947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3294784Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3295574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3296394Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3297186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3297985Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3298774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3299567Z outs = compiled_fn(args) 2025-12-04T11:31:57.3300213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3301123Z return self.current_callable(inputs) 2025-12-04T11:31:57.3301787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3302429Z out = model(new_inputs) 2025-12-04T11:31:57.3303065Z File "/tmp/tmpb3ca8a_1/f2/cf2axkrhxyb3addnvgov27lxtexa37daf537i2bseo4z4pj26rjb.py", line 232, in call 2025-12-04T11:31:57.3304016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3304628Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3305082Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3305462Z 2025-12-04T11:31:57.3305670Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3306590Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3307394Z 2025-12-04T11:31:57.3307672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3308280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3308800Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3309396Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3311384Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3313122Z graph_break [] 2025-12-04T11:31:57.3313501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3314068Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3314661Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3316556Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3318247Z graph_break [] 2025-12-04T11:31:57.3318542Z =================================== FAILURES =================================== 2025-12-04T11:31:57.3319124Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3319654Z Traceback (most recent call last): 2025-12-04T11:31:57.3320392Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3321163Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3321891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3322899Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3323825Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3324555Z def fn(x, y): 2025-12-04T11:31:57.3325225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3326079Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3326809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3327611Z return compiled_fn(full_args) 2025-12-04T11:31:57.3328600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3329592Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3330547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3331577Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3332496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3333361Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3334342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3335253Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3336159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3337097Z outs = compiled_fn(args) 2025-12-04T11:31:57.3337864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3338718Z return self.current_callable(inputs) 2025-12-04T11:31:57.3339569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3340276Z out = model(new_inputs) 2025-12-04T11:31:57.3341115Z File "/tmp/tmp4vd8saqk/ju/cjuv7bhfdqljwtykv6sxmap45z57mfp65htxyudkics6zhsum7hk.py", line 232, in call 2025-12-04T11:31:57.3342222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3342940Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3343557Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3344029Z 2025-12-04T11:31:57.3344253Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3364506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3365338Z 2025-12-04T11:31:57.3365622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3366240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3366770Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3367373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3369292Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3370992Z graph_break [] 2025-12-04T11:31:57.3371366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3371884Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3372481Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3374378Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3376068Z graph_break [] 2025-12-04T11:31:57.3376439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3376955Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3377539Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3379453Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3381143Z graph_break [] 2025-12-04T11:31:57.3382084Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml - 2025-12-04T11:31:57.3383147Z =========================== short test summary info ============================ 2025-12-04T11:31:57.3384240Z FAILED [1.4461s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3385197Z 2025-12-04T11:31:57.3385414Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3386338Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3387049Z 2025-12-04T11:31:57.3387311Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3387931Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.3388436Z ==================== 1 failed, 16 passed, 2 rerun in 15.91s ==================== 2025-12-04T11:31:57.3388867Z Got exit code 1 2025-12-04T11:31:57.3389117Z Retrying single test... 2025-12-04T11:31:57.3389929Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml 2025-12-04T11:31:57.3390818Z ============================= test session starts ============================== 2025-12-04T11:31:57.3391485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.3392076Z cachedir: .pytest_cache 2025-12-04T11:31:57.3392777Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.3393542Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.3393875Z configfile: pytest.ini 2025-12-04T11:31:57.3394638Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.3395563Z collecting ... collected 32 items / 31 deselected / 1 selected 2025-12-04T11:31:57.3396574Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3397468Z Running 1 items in this shard 2025-12-04T11:31:57.3397687Z 2025-12-04T11:31:57.3398622Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:01.422368080 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T11:31:57.3400197Z [W1204 11:29:01.422391520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.3401038Z 2025-12-04T11:31:57.3401226Z ('RERUN', {'yellow': True}) [20.3490s] [100%] 2025-12-04T11:31:57.3402524Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:18.573688796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.3403650Z 2025-12-04T11:31:57.3403784Z ('RERUN', {'yellow': True}) [1.4459s] [100%] 2025-12-04T11:31:57.3405019Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:20.986784791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.3406122Z 2025-12-04T11:31:57.3406236Z FAILED [1.4106s] [100%] 2025-12-04T11:31:57.3406407Z 2025-12-04T11:31:57.3406545Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.3407122Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3407665Z Traceback (most recent call last): 2025-12-04T11:31:57.3408402Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3409168Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3409920Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3410637Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3411319Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3411911Z def fn(x, y): 2025-12-04T11:31:57.3412489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3413155Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3413809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3414520Z return compiled_fn(full_args) 2025-12-04T11:31:57.3415387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3416253Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3417150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3418009Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3418802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3419661Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3420472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3421274Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3422066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3422855Z outs = compiled_fn(args) 2025-12-04T11:31:57.3423517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3424234Z return self.current_callable(inputs) 2025-12-04T11:31:57.3424883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3425515Z out = model(new_inputs) 2025-12-04T11:31:57.3426195Z File "/tmp/tmp0crgso0o/xs/cxsear67vvgewktho4wjienlirzjq7esl7uuxpz24mb2iy7tu5av.py", line 232, in call 2025-12-04T11:31:57.3427162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3427763Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3428239Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3429351Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.3430321Z C++ CapturedTraceback: 2025-12-04T11:31:57.3431779Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.3433670Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.3434608Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.3435873Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.3438048Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3441128Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3450035Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3460803Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.3465709Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3467816Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3472850Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3478153Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3480004Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.3484751Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.3489060Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.3490551Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.3492229Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.3498058Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.3503835Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.3504546Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.3505273Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.3505938Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.3506787Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3507594Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3508377Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3509106Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3509954Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3510889Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3511794Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3512699Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3513593Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3514499Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3515314Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3516043Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3516763Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3517602Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3518505Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3519406Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3520296Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3521056Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3521814Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3522709Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3523427Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3524162Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3524998Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3525890Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3526789Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3527696Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3528600Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3529490Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3530255Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3530828Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.3531439Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3532330Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3532993Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.3533347Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.3533968Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3534728Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3535488Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3536499Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3537396Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3538224Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3538908Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3539656Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3540451Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3541137Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3541908Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3542694Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3543376Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3544134Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3544930Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3545597Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3546355Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3547113Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3547858Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3548618Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3549382Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3550139Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3550888Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3551795Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3552695Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3553597Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3554486Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3555388Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3556290Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3557192Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3558115Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3559019Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3559921Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3560702Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3561412Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3562171Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3563169Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3563984Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3564718Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3565476Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3566321Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3567234Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3568162Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3569090Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3569867Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3570648Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3571566Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3572494Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3573414Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3574333Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3575187Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3575989Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3576732Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3577443Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.3578101Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3578879Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3579804Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3580716Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3581640Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3582564Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3583341Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3584100Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3585019Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3586040Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3586968Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3587877Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3588651Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3589463Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3590379Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3591282Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3592227Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3593142Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3594024Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3594819Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3595570Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3596308Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3597154Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3598075Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3598853Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3599628Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3600542Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3601784Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3602814Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3603739Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3604599Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3605400Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3606146Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3606869Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3607727Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3608655Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3609576Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3610490Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3611411Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3612325Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3613104Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3613865Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3614882Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3615808Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3616733Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3617639Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3618555Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3619355Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3620081Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3620868Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3621717Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3622680Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3623587Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3624503Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3625420Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3626337Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3627242Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3628159Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3629083Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3630002Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3630794Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.3631527Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.3632241Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.3632913Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.3633687Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.3634498Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.3635246Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.3635931Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.3636602Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.3637198Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.3637613Z #184 _start from ??:0 2025-12-04T11:31:57.3637906Z #185 from ??:0 2025-12-04T11:31:57.3638151Z 2025-12-04T11:31:57.3638156Z 2025-12-04T11:31:57.3638371Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3639304Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3640024Z 2025-12-04T11:31:57.3640288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3640910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3641429Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3643254Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3645068Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3645637Z graph_break [] 2025-12-04T11:31:57.3646080Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3646620Z Traceback (most recent call last): 2025-12-04T11:31:57.3647379Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3648152Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3648897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3649652Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3650212Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3650809Z def fn(x, y): 2025-12-04T11:31:57.3651388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3652045Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3652718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3653427Z return compiled_fn(full_args) 2025-12-04T11:31:57.3654224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3655087Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3655957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3656806Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3657597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3658427Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3659237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3660041Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3660823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3661620Z outs = compiled_fn(args) 2025-12-04T11:31:57.3662277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3662986Z return self.current_callable(inputs) 2025-12-04T11:31:57.3663642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3664281Z out = model(new_inputs) 2025-12-04T11:31:57.3664936Z File "/tmp/tmp6k3ykq_m/qz/cqze7pled3tdw4klik77kfgsmkopndkuq3mwgtsdaohpcttno6f5.py", line 232, in call 2025-12-04T11:31:57.3665875Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3666492Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3666960Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3668060Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.3669022Z C++ CapturedTraceback: 2025-12-04T11:31:57.3670537Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.3672409Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.3673374Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.3674613Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.3676819Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3679933Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3688862Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3699561Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.3704708Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3706857Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3711792Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3717125Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3718930Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.3723729Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.3728045Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.3729526Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.3731208Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.3737081Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.3742553Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.3743271Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.3743988Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.3744649Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.3745418Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3746231Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3746950Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3747678Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3748516Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3749413Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3750314Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3751221Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3752129Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3753017Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3753823Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3754561Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3755283Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3756104Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3757008Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3757911Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3758813Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3759566Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3760321Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3761177Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3761897Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3762701Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3763540Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3764479Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3765366Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3766305Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3767205Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3768110Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3768893Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3769406Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.3770013Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3770906Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3771566Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.3771922Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.3772526Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3773277Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3774036Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3774945Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3775846Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3776628Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3777304Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3778063Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3778855Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3779538Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3780302Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3781100Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3781780Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3782537Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3783329Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3784013Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3784763Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3785530Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3786295Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3787045Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3787850Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3788619Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3789378Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3790270Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3791202Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3792107Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3793019Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3793950Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3794855Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3795785Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3796673Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3797577Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3798482Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3799271Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.3799936Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3800695Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3801798Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3802685Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3803417Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3804145Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3804997Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3805929Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3806836Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3807764Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3808542Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3809299Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3810220Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3811136Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3812052Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3812954Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3813826Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3814622Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3815366Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3816057Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.3816814Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3817595Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3818504Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3819473Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3820394Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3821311Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3822116Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3822891Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3823856Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3824779Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3825682Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3826597Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3827368Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3828148Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3829053Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3829968Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3830887Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3831803Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3832653Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3834167Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3834912Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3835635Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3836485Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3837402Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3838174Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3838937Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3839852Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3840767Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3841686Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3842691Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3843557Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3844362Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3845156Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3845886Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3846736Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3847654Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3848594Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3849518Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3850437Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3851386Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3852144Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.3852948Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3853863Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3854781Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3855684Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3856607Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3857470Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.3858273Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3859003Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3859747Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3860593Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3861505Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3862428Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3863343Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3864263Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3865169Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3866087Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3867014Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3867931Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3868836Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3869638Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.3870365Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.3871075Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.3871752Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.3872517Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.3873365Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.3874106Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.3874802Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.3875468Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.3876063Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.3876507Z #184 _start from ??:0 2025-12-04T11:31:57.3876798Z #185 from ??:0 2025-12-04T11:31:57.3877026Z 2025-12-04T11:31:57.3877031Z 2025-12-04T11:31:57.3877255Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.3878209Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.3878933Z 2025-12-04T11:31:57.3879197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.3879854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3880371Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3882013Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3883924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3884470Z graph_break [] 2025-12-04T11:31:57.3884842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.3885341Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.3885939Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.3887855Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.3889545Z graph_break [] 2025-12-04T11:31:57.3889842Z =================================== FAILURES =================================== 2025-12-04T11:31:57.3890408Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.3890952Z Traceback (most recent call last): 2025-12-04T11:31:57.3891694Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.3892461Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.3893205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.3893927Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3894502Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.3895092Z def fn(x, y): 2025-12-04T11:31:57.3895665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.3896335Z return fn(*args, **kwargs) 2025-12-04T11:31:57.3896992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.3897706Z return compiled_fn(full_args) 2025-12-04T11:31:57.3898523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.3899386Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.3900278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.3901292Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.3902091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.3902906Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.3903804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.3904606Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.3905397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.3906230Z outs = compiled_fn(args) 2025-12-04T11:31:57.3906890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.3907649Z return self.current_callable(inputs) 2025-12-04T11:31:57.3908298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.3908928Z out = model(new_inputs) 2025-12-04T11:31:57.3909604Z File "/tmp/tmpcnm13bt0/d7/cd7ienhq7syisf2qdafw5dp4zbzrps2m3gys7cbv7re7algg3qc3.py", line 232, in call 2025-12-04T11:31:57.3910574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.3911173Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.3911640Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.3912751Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.3913727Z C++ CapturedTraceback: 2025-12-04T11:31:57.3915184Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.3917055Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.3917998Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.3919253Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.3921436Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3924559Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3933427Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3944171Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.3949075Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.3951157Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3956135Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.3961427Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.3963319Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.3968002Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.3972369Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.3973857Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.3975541Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.3981325Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.3986748Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.3987468Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.3988200Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.3988848Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.3989618Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3990432Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3991202Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3991922Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.3992769Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.3993681Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3994700Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3995594Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3996532Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.3997435Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.3998232Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.3998995Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.3999725Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4000567Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4001691Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4002690Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4003590Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4004357Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4005108Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4005918Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4006643Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4007352Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4008193Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4009098Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4010000Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4010895Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4011798Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4012708Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4013472Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4013973Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4014580Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4015488Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4016144Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.4016488Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4017090Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4017851Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4018699Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4019606Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4020507Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4021303Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4022017Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4022783Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4023579Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4024294Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4025058Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4025851Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4026573Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4027317Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4028105Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4028784Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4029544Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4030295Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4031065Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4031824Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4032572Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4033332Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4034088Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4034993Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4035885Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4036788Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4037689Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4038588Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4039480Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4040384Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4041282Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4042177Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4043146Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4043943Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4044621Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4045364Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4046276Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4047062Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4047795Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4048504Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4049386Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4050316Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4051245Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4052190Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4052968Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4053777Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4054685Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4055605Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4056522Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4057437Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4058288Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4059089Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4059829Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4060536Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4061197Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4061975Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4062902Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4063840Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4064755Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4065682Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4066468Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4067237Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4068165Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4069088Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4070014Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4070925Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4071705Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4072482Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4073405Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4074356Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4075279Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4076200Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4077066Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4077893Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4078637Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4079379Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4080252Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4081173Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4081980Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4082847Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4083761Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4084680Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4085599Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4086521Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4087377Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4088179Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4088930Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4089667Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4090505Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4091423Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4092343Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4093250Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4094171Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4095090Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4095867Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4096625Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4097544Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4098460Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4099380Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4100283Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4101307Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4102113Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4102947Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4103685Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4104540Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4105464Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4106411Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4107334Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4108249Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4109209Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4110116Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4111084Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4112003Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4112920Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4113709Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.4114443Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.4115155Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.4115840Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.4116599Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.4117408Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.4118153Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.4118840Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.4119510Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.4120105Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.4120533Z #184 _start from ??:0 2025-12-04T11:31:57.4120815Z #185 from ??:0 2025-12-04T11:31:57.4121058Z 2025-12-04T11:31:57.4121063Z 2025-12-04T11:31:57.4121275Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.4122208Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4123025Z 2025-12-04T11:31:57.4123305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.4123917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4124438Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4126089Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4127905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4128440Z graph_break [] 2025-12-04T11:31:57.4128810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4129338Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4129976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4131909Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4133602Z graph_break [] 2025-12-04T11:31:57.4133971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4134471Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4135097Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4137010Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4138736Z graph_break [] 2025-12-04T11:31:57.4139672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml - 2025-12-04T11:31:57.4140740Z =========================== short test summary info ============================ 2025-12-04T11:31:57.4141834Z FAILED [1.4106s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.4143478Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.4144455Z C++ CapturedTraceback: 2025-12-04T11:31:57.4145917Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.4147794Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.4148732Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.4149998Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.4152192Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4155269Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4164207Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4174944Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.4179864Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4181949Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4186953Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4192252Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4194082Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.4198765Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.4203497Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.4204981Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.4206666Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.4212457Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.4217872Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.4218584Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.4219305Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4219952Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.4220715Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4221603Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4222341Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4223053Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4223895Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4224842Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4225748Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4226638Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4227604Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4228504Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4229337Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4230063Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4230782Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4231617Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4232505Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4233405Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4234309Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4235072Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4235823Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4236632Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4237358Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4238093Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4238930Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4239849Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4240754Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4241644Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4242661Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4243575Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4244346Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4244854Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4245471Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4246385Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4247045Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.4247390Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4248057Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4248835Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4249650Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4267315Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4268364Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4269175Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4269970Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4270729Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4271576Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4272253Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4273008Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4273846Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4274522Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4275286Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4276067Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4276750Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4277506Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4278263Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4279021Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4279779Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4280536Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4281281Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4282039Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4283042Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4283949Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4284841Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4285743Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4286650Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4287546Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4288447Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4289365Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4290269Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4291162Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4291953Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4292637Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4293394Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4294298Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4295086Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4295820Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4296556Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4297431Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4298360Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4299311Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4300220Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4301273Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4302139Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4303061Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4303970Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4304891Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4305810Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4306678Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4307460Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4308205Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4308912Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4309575Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4310334Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4311258Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4312185Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4313084Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4314014Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4314793Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4315566Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4316475Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4317391Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4318311Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4319228Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4319982Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4320756Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4321675Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4322739Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4323650Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4324572Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4325477Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4326262Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4327003Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4327785Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4328639Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4329547Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4330353Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4331127Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4332043Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4332952Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4333871Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4334793Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4335654Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4336445Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4337191Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4337927Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4338763Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4339687Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4340602Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4341519Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4342429Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4343352Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4344129Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4344905Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4345818Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4346745Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4347662Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4348582Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4349440Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4350290Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4351040Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4351763Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4352615Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4353605Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4354525Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4355434Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4356399Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4357311Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4358265Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4359173Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4360084Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4361001Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4361806Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.4362616Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.4363337Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.4364025Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.4364780Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.4365604Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.4366368Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.4367065Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.4367727Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.4368327Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.4368755Z #184 _start from ??:0 2025-12-04T11:31:57.4369037Z #185 from ??:0 2025-12-04T11:31:57.4369283Z 2025-12-04T11:31:57.4369288Z 2025-12-04T11:31:57.4369503Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.4370447Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4371169Z 2025-12-04T11:31:57.4371446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.4372025Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.4372543Z ================== 1 failed, 31 deselected, 2 rerun in 23.24s ================== 2025-12-04T11:31:57.4372980Z Got exit code 1 2025-12-04T11:31:57.4373247Z Retrying single test... 2025-12-04T11:31:57.4374028Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml 2025-12-04T11:31:57.4374930Z ============================= test session starts ============================== 2025-12-04T11:31:57.4375592Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.4376168Z cachedir: .pytest_cache 2025-12-04T11:31:57.4376928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.4377702Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.4378055Z configfile: pytest.ini 2025-12-04T11:31:57.4378811Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.4379741Z collecting ... collected 32 items / 31 deselected / 1 selected 2025-12-04T11:31:57.4380806Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4381718Z Running 1 items in this shard 2025-12-04T11:31:57.4381961Z 2025-12-04T11:31:57.4382903Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:37.237515166 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T11:31:57.4384519Z [W1204 11:29:37.237537088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.4385172Z 2025-12-04T11:31:57.4385304Z ('RERUN', {'yellow': True}) [20.4072s] [100%] 2025-12-04T11:31:57.4386539Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:54.428333142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.4387659Z 2025-12-04T11:31:57.4387789Z ('RERUN', {'yellow': True}) [1.4503s] [100%] 2025-12-04T11:31:57.4389015Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:56.849370129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.4390128Z 2025-12-04T11:31:57.4390225Z FAILED [1.4188s] [100%] 2025-12-04T11:31:57.4390395Z 2025-12-04T11:31:57.4390546Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.4391114Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.4391663Z Traceback (most recent call last): 2025-12-04T11:31:57.4392400Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.4393179Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.4393911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.4394636Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4395208Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.4395800Z def fn(x, y): 2025-12-04T11:31:57.4396378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.4397047Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4397717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.4398423Z return compiled_fn(full_args) 2025-12-04T11:31:57.4399253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.4400122Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.4401280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.4402145Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.4403046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.4403885Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.4404779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.4405589Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.4406379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.4407189Z outs = compiled_fn(args) 2025-12-04T11:31:57.4407882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.4408606Z return self.current_callable(inputs) 2025-12-04T11:31:57.4409260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.4409939Z out = model(new_inputs) 2025-12-04T11:31:57.4410610Z File "/tmp/tmp5a7naebo/b3/cb3ut5djh46v5f4z2ofuyumglu2gofja3wthfdneyepvef4lkznn.py", line 232, in call 2025-12-04T11:31:57.4411577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.4412240Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.4412704Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.4413819Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.4414804Z C++ CapturedTraceback: 2025-12-04T11:31:57.4416288Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.4418158Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.4419101Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.4420377Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.4422563Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4425661Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4434497Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4445361Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.4450319Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4452428Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4457386Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4462663Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4464491Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.4469343Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.4473700Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.4475229Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.4476900Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.4482790Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.4488251Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.4488960Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.4489696Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4490362Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.4491114Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4491924Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4492655Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4493379Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4494206Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4495113Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4496068Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4496977Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4497864Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4498771Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4499603Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4500329Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4501205Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4502126Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4503034Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4503967Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4504866Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4505628Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4506390Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4507183Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4507910Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4508641Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4509476Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4510378Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4511281Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4512185Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4513093Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4513989Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4514751Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4515276Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4515870Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4516775Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4517436Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.4517793Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4518384Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4519144Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4519900Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4520786Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4521691Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4522588Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4523278Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4524123Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4524920Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4525601Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4526364Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4527184Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4527866Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4528625Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4529436Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4530113Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4530904Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4531662Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4532404Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4533160Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4533912Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4534676Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4535420Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4536319Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4537292Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4538186Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4539084Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4539992Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4540898Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4541783Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4542686Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4543594Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4544499Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4545282Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4545963Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4546715Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4547572Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4548340Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4549071Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4549802Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4550639Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4551627Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4552553Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4553477Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4554241Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4555058Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4555981Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4556941Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4557850Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4558828Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4559699Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4560503Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4561234Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4561943Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4562713Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4563475Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4564399Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4565324Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4566244Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4567153Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4567927Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4568734Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4569650Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4570557Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4571476Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4572399Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4573160Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4573925Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4574844Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4575770Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4576673Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4577591Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4578461Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4579255Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4580043Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4580780Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4581629Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4582581Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4583341Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4584111Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4585060Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4585966Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4586918Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4587837Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4588701Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4589483Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4590230Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4590968Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4591824Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4592733Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4593655Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4594572Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4595489Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4596391Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4597166Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4597936Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4598839Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4599756Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4600672Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4601848Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4602837Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4603640Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4604388Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4605129Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4605971Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4606891Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4607932Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4608860Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4609762Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4610679Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4611639Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4612543Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4613459Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4614418Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4615220Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.4615977Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.4616689Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.4617374Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.4618146Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.4618946Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.4619698Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.4620405Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.4621063Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.4621660Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.4622095Z #184 _start from ??:0 2025-12-04T11:31:57.4622390Z #185 from ??:0 2025-12-04T11:31:57.4622620Z 2025-12-04T11:31:57.4622626Z 2025-12-04T11:31:57.4622839Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.4623777Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4624505Z 2025-12-04T11:31:57.4624772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.4625398Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4625907Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4627573Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4629405Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4629957Z graph_break [] 2025-12-04T11:31:57.4630395Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.4630949Z Traceback (most recent call last): 2025-12-04T11:31:57.4631701Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.4632472Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.4633218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.4633955Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4634535Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.4635170Z def fn(x, y): 2025-12-04T11:31:57.4635762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.4636433Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4637104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.4637820Z return compiled_fn(full_args) 2025-12-04T11:31:57.4638679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.4639555Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.4640422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.4641314Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.4642111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.4643093Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.4643883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.4644686Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.4645482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.4646268Z outs = compiled_fn(args) 2025-12-04T11:31:57.4646926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.4647642Z return self.current_callable(inputs) 2025-12-04T11:31:57.4648290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.4648926Z out = model(new_inputs) 2025-12-04T11:31:57.4649597Z File "/tmp/tmpc22tbkmr/7v/c7vp5axqaeorg7ro46hdflae277p4tydujnrbi65og4m7x4bl36l.py", line 232, in call 2025-12-04T11:31:57.4650561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.4651175Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.4651634Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.4652740Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.4653713Z C++ CapturedTraceback: 2025-12-04T11:31:57.4655181Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.4657051Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.4657989Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.4659250Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.4661432Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4664547Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4673311Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4684159Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.4689071Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4691175Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4696218Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4701752Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4703577Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.4708345Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.4712671Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.4714148Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.4715838Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.4721619Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.4727215Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.4727940Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.4728671Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4729333Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.4730123Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4730931Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4731659Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4732417Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4733257Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4734197Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4735101Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4735993Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4736902Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4737799Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4738605Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4739321Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4740045Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4740883Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4741778Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4742674Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4743578Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4744341Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4745083Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4745896Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4746628Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4747358Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4748188Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4749094Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4749993Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4750897Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4751785Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4752691Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4753452Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4753956Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4754610Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4755512Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4756169Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.4756516Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4757148Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4757912Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4758661Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4759598Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4760503Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4761335Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4762006Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4762835Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4763641Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4764329Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4765076Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4765870Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4766555Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4767306Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4768108Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4768787Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4769549Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4770298Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4771066Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4771826Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4772590Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4773340Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4774096Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4775003Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4775894Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4776790Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4777702Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4778607Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4779496Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4780405Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4781350Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4782256Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4783148Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4783935Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4784667Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4785424Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4786260Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4787082Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4787814Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4788528Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4789409Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4790331Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4791248Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4792153Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4792927Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4793706Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4794625Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4795530Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4796450Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4797369Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4798233Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4799016Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4799761Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4800467Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4801356Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4802133Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4803148Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4804072Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4804975Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4805898Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4806671Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4807446Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4808351Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4809264Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4810267Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4811184Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4811942Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4812715Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4813682Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4814586Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4815539Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4816456Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4817359Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4818142Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4818879Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4819613Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4820465Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4821369Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4822150Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4822924Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4823845Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4824763Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4825682Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4826600Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4827445Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4828239Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4828976Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4829712Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4830548Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4831469Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4832388Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4833312Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4834216Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4835145Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4835922Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4836702Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4837649Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4838579Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4839496Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4840405Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4841304Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4842102Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4842941Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4843792Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4844646Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4845608Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4846533Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4847441Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4848365Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4849280Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4850198Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4851105Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4852026Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4852943Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4853731Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.4854459Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.4855165Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.4855848Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.4856596Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.4857404Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.4858150Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.4858851Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.4859513Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.4860112Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.4860536Z #184 _start from ??:0 2025-12-04T11:31:57.4860824Z #185 from ??:0 2025-12-04T11:31:57.4861071Z 2025-12-04T11:31:57.4861076Z 2025-12-04T11:31:57.4861289Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.4862225Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4862934Z 2025-12-04T11:31:57.4863216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.4863440Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4863602Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4865013Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4865320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4865460Z graph_break [] 2025-12-04T11:31:57.4865678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4865835Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4866174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4867637Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4867781Z graph_break [] 2025-12-04T11:31:57.4867922Z =================================== FAILURES =================================== 2025-12-04T11:31:57.4868208Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________ 2025-12-04T11:31:57.4868339Z Traceback (most recent call last): 2025-12-04T11:31:57.4868844Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides 2025-12-04T11:31:57.4868996Z torch.compile(fn, fullgraph=True)(x, y) 2025-12-04T11:31:57.4869480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.4869594Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4869973Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn 2025-12-04T11:31:57.4870069Z def fn(x, y): 2025-12-04T11:31:57.4870484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.4870606Z return fn(*args, **kwargs) 2025-12-04T11:31:57.4871067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.4871199Z return compiled_fn(full_args) 2025-12-04T11:31:57.4871789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.4871926Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.4872542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.4872659Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.4873221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.4873367Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.4873907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.4874038Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.4874589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.4874699Z outs = compiled_fn(args) 2025-12-04T11:31:57.4875160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.4875291Z return self.current_callable(inputs) 2025-12-04T11:31:57.4875700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.4875876Z out = model(new_inputs) 2025-12-04T11:31:57.4876355Z File "/tmp/tmpj7ii8w8h/fj/cfj7rfg42bnrfqziropycccddw22twn3ztyyugkctad7o4cjkzxo.py", line 232, in call 2025-12-04T11:31:57.4876725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.4876837Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.4877078Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.4877866Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.4877978Z C++ CapturedTraceback: 2025-12-04T11:31:57.4879314Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.4879900Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.4880229Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.4881043Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.4882294Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4884066Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4891088Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4894556Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.4895978Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4896588Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4900796Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4901706Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.4902677Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.4906313Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.4906950Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.4907716Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.4908550Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.4913410Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.4913732Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.4914063Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.4914327Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4914585Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.4914963Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4915261Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4915550Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4915855Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4916255Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4916631Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4917029Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4917394Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4917804Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4918164Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4918474Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4918759Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4919103Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4919516Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4919877Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4920280Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4920668Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4920922Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4921296Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4921619Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4921905Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4922272Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4922767Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4923147Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4923544Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4923904Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4924310Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4924673Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4924940Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4925067Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4925430Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4925837Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4925956Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.4926071Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.4926448Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4926699Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4927075Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4927470Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4927831Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4928129Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4928380Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4928754Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4929036Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4929289Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4929664Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4929949Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4930198Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4931088Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4931380Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4931645Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4932009Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4932291Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4932668Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4932920Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4933332Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4933583Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4933951Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4934394Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4934755Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4935154Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4935531Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4935925Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4936304Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4936699Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4937064Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4937474Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4937837Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4938136Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.4938389Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4938748Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4939102Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4939402Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4939707Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4940007Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4940414Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4940796Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4941203Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4941573Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4941848Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4942217Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4942637Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4943040Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4943444Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4943826Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4944209Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4944526Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4944815Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4945112Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.4945381Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4945748Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4946179Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4946556Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4946957Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4947337Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4947597Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4947962Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4948378Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4948745Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4949159Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4949528Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4949785Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4950165Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4950567Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4950947Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4951351Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4951723Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4952087Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4952390Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4952678Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4952996Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4953402Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4953783Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4954045Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4954413Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4954863Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4955232Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4955646Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4956043Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4956392Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4956707Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4957030Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4957341Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4957746Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4958141Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4958554Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4958922Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4959326Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4959707Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4959970Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.4960351Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4960753Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4961121Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4961536Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4961904Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4962264Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.4962656Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.4962953Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.4963267Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.4963672Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.4964059Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4964464Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4964831Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4965251Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4965619Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4966024Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4966411Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4966853Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.4967240Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.4967524Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.4967825Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.4968130Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.4968414Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.4968768Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.4969144Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.4969424Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.4969734Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.4969998Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.4970189Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.4970304Z #184 _start from ??:0 2025-12-04T11:31:57.4970424Z #185 from ??:0 2025-12-04T11:31:57.4970430Z 2025-12-04T11:31:57.4970435Z 2025-12-04T11:31:57.4970665Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.4971253Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.4971262Z 2025-12-04T11:31:57.4971523Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.4971754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4971917Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4973287Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4973590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4973687Z graph_break [] 2025-12-04T11:31:57.4973916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4974074Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4974383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4975853Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4975950Z graph_break [] 2025-12-04T11:31:57.4976173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.4976329Z stats [('calls_captured', 29), ('unique_graphs', 1)] 2025-12-04T11:31:57.4976643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.4978100Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.4978235Z graph_break [] 2025-12-04T11:31:57.4979022Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml - 2025-12-04T11:31:57.4979190Z =========================== short test summary info ============================ 2025-12-04T11:31:57.4980002Z FAILED [1.4188s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.4980738Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.4980876Z C++ CapturedTraceback: 2025-12-04T11:31:57.4982156Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.4982661Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.4982999Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.4983790Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.4985065Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4986741Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4993725Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.4997198Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.4998638Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.4999233Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5003804Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5004555Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5005519Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5009173Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5009799Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5010586Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5011407Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5016284Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5016596Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5016929Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5017194Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5017462Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5017828Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5018126Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5018424Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5018721Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5019134Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5019499Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5019896Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5020269Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5020662Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5021018Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5021331Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5021652Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5021963Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5022357Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5022717Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5023169Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5023531Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5023797Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5024190Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5024484Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5024811Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5025124Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5025534Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5025895Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5026293Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5026666Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5027060Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5027423Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5027692Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5027816Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5028190Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5028584Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5028703Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5028835Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5029197Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5029454Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5029831Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5030227Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5030605Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5030893Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5031146Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5031521Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5031808Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5032070Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5032431Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5032714Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5032982Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5033383Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5033667Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5033930Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5034292Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5034587Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5034954Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5035230Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5035609Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5035860Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5036262Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5036659Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5037023Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5037437Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5037794Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5038203Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5038568Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5038966Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5039341Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5039733Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5040090Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5040388Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5040639Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5041013Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5041350Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5041648Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5041951Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5042242Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5042744Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5043115Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5043519Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5043901Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5044159Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5044541Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5044985Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5045358Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5045772Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5046139Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5046515Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5046832Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5047156Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5047434Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5047693Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5048091Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5048506Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5048874Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5049292Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5049658Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5049913Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5050292Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5050695Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5051075Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5051480Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5051844Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5052116Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5052484Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5052885Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5053265Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5053670Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5054052Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5054400Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5054702Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5055006Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5055309Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5055721Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5056089Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5056345Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5056760Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5057166Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5057534Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5057946Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5058358Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5058718Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5059051Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5059341Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5059657Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5060091Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5060473Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5060875Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5061243Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5061661Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5062032Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5062305Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5062676Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5063077Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5063461Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5063861Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5064227Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5064584Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5064887Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5065191Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5065491Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5065898Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5066278Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5066679Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5067058Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5067458Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5067821Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5068236Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5068646Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5069062Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5069429Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5069711Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5070051Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5070317Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5070594Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5070979Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5071296Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5071592Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5071890Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5072152Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5072358Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5072457Z #184 _start from ??:0 2025-12-04T11:31:57.5072573Z #185 from ??:0 2025-12-04T11:31:57.5072596Z 2025-12-04T11:31:57.5072601Z 2025-12-04T11:31:57.5072814Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5073402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.5073411Z 2025-12-04T11:31:57.5073687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5073867Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.5074075Z ================== 1 failed, 31 deselected, 2 rerun in 23.31s ================== 2025-12-04T11:31:57.5074172Z Got exit code 1 2025-12-04T11:31:57.5074686Z FAILED CONSISTENTLY: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda 2025-12-04T11:31:57.5075102Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:31:57.5075701Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml 2025-12-04T11:31:57.5075862Z ============================= test session starts ============================== 2025-12-04T11:31:57.5076218Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.5076325Z cachedir: .pytest_cache 2025-12-04T11:31:57.5076847Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.5076972Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.5077075Z configfile: pytest.ini 2025-12-04T11:31:57.5077667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.5077885Z collecting ... collected 32 items / 17 deselected / 15 selected 2025-12-04T11:31:57.5078024Z stepcurrent: skipping 17 already run items. 2025-12-04T11:31:57.5078153Z Running 15 items in this shard 2025-12-04T11:31:57.5078158Z 2025-12-04T11:31:57.5078650Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda ('RERUN', {'yellow': True}) [4.5453s] [ 6%] 2025-12-04T11:31:57.5079150Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda ('RERUN', {'yellow': True}) [1.5071s] [ 6%] 2025-12-04T11:31:57.5079579Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda FAILED [1.3130s] [ 6%] 2025-12-04T11:31:57.5079587Z 2025-12-04T11:31:57.5079727Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.5079997Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5080114Z Traceback (most recent call last): 2025-12-04T11:31:57.5080539Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5080694Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5081174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5081295Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5081760Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5081855Z def fn(x): 2025-12-04T11:31:57.5082291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5082526Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5083000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5096109Z return compiled_fn(full_args) 2025-12-04T11:31:57.5096828Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5096995Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5097603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5097726Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5098303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5098435Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5099002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5099120Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5099665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5099793Z outs = compiled_fn(args) 2025-12-04T11:31:57.5100249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5100390Z return self.current_callable(inputs) 2025-12-04T11:31:57.5100783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5101100Z out = model(new_inputs) 2025-12-04T11:31:57.5101594Z File "/tmp/tmpbxclaczo/tb/ctbqebvmruj4nkytdlerbrxeyr4bumhrd3m254oilw7ylx6twan5.py", line 227, in call 2025-12-04T11:31:57.5101955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5102071Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5102324Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5102330Z 2025-12-04T11:31:57.5102540Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5103063Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5103069Z 2025-12-04T11:31:57.5103331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5103552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5103731Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5105220Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5105545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5105642Z graph_break [] 2025-12-04T11:31:57.5105897Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5106090Z Traceback (most recent call last): 2025-12-04T11:31:57.5106504Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5106630Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5107168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5107275Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5107653Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5107792Z def fn(x): 2025-12-04T11:31:57.5108207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5108328Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5108783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5108898Z return compiled_fn(full_args) 2025-12-04T11:31:57.5109500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5109637Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5110244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5110357Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5110913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5111060Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5111598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5111713Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5112273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5112383Z outs = compiled_fn(args) 2025-12-04T11:31:57.5112844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5112972Z return self.current_callable(inputs) 2025-12-04T11:31:57.5113365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5113487Z out = model(new_inputs) 2025-12-04T11:31:57.5113954Z File "/tmp/tmpdz9eef4b/27/c27yqyxljiol2gqdvi4ib2hnzms6hh5nu6tdhe5dae575qpbziz5.py", line 227, in call 2025-12-04T11:31:57.5114322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5114435Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5114675Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5114683Z 2025-12-04T11:31:57.5114906Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5115404Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5115413Z 2025-12-04T11:31:57.5115676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5115906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5116103Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5117479Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5117825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5117924Z graph_break [] 2025-12-04T11:31:57.5118153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5118312Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5118656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5120005Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5120149Z graph_break [] 2025-12-04T11:31:57.5120306Z =================================== FAILURES =================================== 2025-12-04T11:31:57.5120568Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5120700Z Traceback (most recent call last): 2025-12-04T11:31:57.5121114Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5121246Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5121736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5121846Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5122210Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5122319Z def fn(x): 2025-12-04T11:31:57.5122833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5122960Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5123421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5123540Z return compiled_fn(full_args) 2025-12-04T11:31:57.5124141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5124281Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5124892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5125014Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5125576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5125728Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5126271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5126385Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5126941Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5127049Z outs = compiled_fn(args) 2025-12-04T11:31:57.5127509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5127635Z return self.current_callable(inputs) 2025-12-04T11:31:57.5128089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5128211Z out = model(new_inputs) 2025-12-04T11:31:57.5128681Z File "/tmp/tmpdu2grb28/xe/cxet3htjci5kxwcdyfvvf4robtutuvgi2ijy7r2fmo3f6oiavm5f.py", line 227, in call 2025-12-04T11:31:57.5129037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5129164Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5129402Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5129438Z 2025-12-04T11:31:57.5129665Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5130157Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5130191Z 2025-12-04T11:31:57.5130454Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5130679Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5130870Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5132226Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5132528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5132626Z graph_break [] 2025-12-04T11:31:57.5132848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5133009Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5133317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5134664Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5134760Z graph_break [] 2025-12-04T11:31:57.5134984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5135140Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5135444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5136907Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5137009Z graph_break [] 2025-12-04T11:31:57.5137789Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml - 2025-12-04T11:31:57.5137959Z =========================== short test summary info ============================ 2025-12-04T11:31:57.5138663Z FAILED [1.3130s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5138669Z 2025-12-04T11:31:57.5138879Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5139374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5139396Z 2025-12-04T11:31:57.5139655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5139858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.5140070Z ================== 1 failed, 17 deselected, 2 rerun in 7.40s =================== 2025-12-04T11:31:57.5140170Z Got exit code 1 2025-12-04T11:31:57.5140273Z Retrying single test... 2025-12-04T11:31:57.5140879Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml 2025-12-04T11:31:57.5141065Z ============================= test session starts ============================== 2025-12-04T11:31:57.5141423Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.5141528Z cachedir: .pytest_cache 2025-12-04T11:31:57.5142067Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.5142199Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.5142304Z configfile: pytest.ini 2025-12-04T11:31:57.5142883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.5143136Z collecting ... collected 32 items / 31 deselected / 1 selected 2025-12-04T11:31:57.5143714Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda 2025-12-04T11:31:57.5143839Z Running 1 items in this shard 2025-12-04T11:31:57.5143844Z 2025-12-04T11:31:57.5144701Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:33.730386338 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T11:31:57.5145212Z [W1204 11:30:33.730410993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5145229Z 2025-12-04T11:31:57.5145358Z ('RERUN', {'yellow': True}) [20.0918s] [100%] 2025-12-04T11:31:57.5146250Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:50.640929219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5146256Z 2025-12-04T11:31:57.5146396Z ('RERUN', {'yellow': True}) [1.3430s] [100%] 2025-12-04T11:31:57.5147283Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:51.970355486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5147288Z 2025-12-04T11:31:57.5147401Z FAILED [1.3269s] [100%] 2025-12-04T11:31:57.5147406Z 2025-12-04T11:31:57.5147543Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.5147801Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5147930Z Traceback (most recent call last): 2025-12-04T11:31:57.5148343Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5148472Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5148961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5149070Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5149446Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5149540Z def fn(x): 2025-12-04T11:31:57.5149956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5150076Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5150542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5150671Z return compiled_fn(full_args) 2025-12-04T11:31:57.5151285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5151426Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5152034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5152150Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5152734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5152882Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5153422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5153581Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5154130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5154271Z outs = compiled_fn(args) 2025-12-04T11:31:57.5154734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5154860Z return self.current_callable(inputs) 2025-12-04T11:31:57.5155255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5155374Z out = model(new_inputs) 2025-12-04T11:31:57.5155851Z File "/tmp/tmpsehuk76x/mm/cmmol6g33c64qnicaudgkpdgbxfisiphdlux2cdngrz2csklmdql.py", line 227, in call 2025-12-04T11:31:57.5156219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5156334Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5156571Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5157322Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5157432Z C++ CapturedTraceback: 2025-12-04T11:31:57.5158723Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5159199Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5159523Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5160327Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5161581Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5163376Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5170403Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5173901Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5175273Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5175890Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5180151Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5180878Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5181886Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5185491Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5186152Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5186884Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5187725Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5192551Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5192830Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5193150Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5193430Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5193684Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5194093Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5194396Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5194684Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5194993Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5195425Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5195801Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5196195Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5196585Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5196990Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5197383Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5197691Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5197972Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5198264Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5198671Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5199033Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5199430Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5199803Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5200061Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5200435Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5200730Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5201259Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5201572Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5201967Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5202345Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5202822Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5203186Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5203595Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5203959Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5204209Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5204349Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5204709Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5205117Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5205238Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5205355Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5205732Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5206060Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5206437Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5206828Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5207187Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5207526Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5207778Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5208137Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5208476Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5208725Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5209145Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5209429Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5209678Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5210052Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5210336Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5210595Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5210959Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5211208Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5211583Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5211840Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5212202Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5212466Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5212833Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5213245Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5213605Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5214001Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5214372Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5214769Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5215140Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5215530Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5215893Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5216296Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5216657Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5216958Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5217208Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5217595Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5217951Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5218247Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5218530Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5218880Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5219286Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5219668Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5220101Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5220471Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5220775Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5221147Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5221557Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5221925Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5222324Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5222704Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5223055Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5223364Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5223666Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5223931Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5224199Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5224567Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5224970Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5225353Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5225757Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5226135Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5226395Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5226758Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5227173Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5227537Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5227950Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5228315Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5228571Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5228950Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5229394Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5229766Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5230179Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5230551Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5230939Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5231247Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5231572Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5231886Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5232295Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5232710Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5232968Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5233338Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5233760Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5234129Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5234543Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5234917Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5235269Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5235590Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5235885Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5236185Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5236605Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5236974Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5237388Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5237757Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5238158Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5238542Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5238804Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5239182Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5239587Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5239951Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5240366Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5240733Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5241092Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5241422Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5241713Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5242022Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5242519Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5242928Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5243351Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5243751Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5244170Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5244570Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5244970Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5245350Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5245750Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5246133Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5246419Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5246720Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5246995Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5247279Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5247636Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5247955Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5248239Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5248516Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5248776Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5248967Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5249081Z #184 _start from ??:0 2025-12-04T11:31:57.5249198Z #185 from ??:0 2025-12-04T11:31:57.5249204Z 2025-12-04T11:31:57.5249210Z 2025-12-04T11:31:57.5249436Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5249937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5249945Z 2025-12-04T11:31:57.5250219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5250436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5250596Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5251965Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5252273Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5252384Z graph_break [] 2025-12-04T11:31:57.5252670Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5252793Z Traceback (most recent call last): 2025-12-04T11:31:57.5253222Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5253351Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5253830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5253985Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5254348Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5254454Z def fn(x): 2025-12-04T11:31:57.5254870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5255007Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5255479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5255629Z return compiled_fn(full_args) 2025-12-04T11:31:57.5256213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5256367Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5256962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5257092Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5257641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5257775Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5258331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5258446Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5258998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5259118Z outs = compiled_fn(args) 2025-12-04T11:31:57.5259565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5259704Z return self.current_callable(inputs) 2025-12-04T11:31:57.5260099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5260201Z out = model(new_inputs) 2025-12-04T11:31:57.5260665Z File "/tmp/tmp_axzmm2t/lf/clfxqxcsmsfumbhzv7b3bld7fpumd3g7khb5qhbxg4xoqjclpb7f.py", line 227, in call 2025-12-04T11:31:57.5261023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5261146Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5261385Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5262114Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5262236Z C++ CapturedTraceback: 2025-12-04T11:31:57.5263519Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5264002Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5264335Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5265167Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5266460Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5268137Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5275155Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5278590Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5280003Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5280605Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5284930Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5285724Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5286679Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5290269Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5290890Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5291628Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5292445Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5297404Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5297685Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5298030Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5298293Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5298594Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5298957Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5299260Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5299547Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5299841Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5300252Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5300614Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5301187Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5301565Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5301958Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5302326Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5302620Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5302904Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5303210Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5303603Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5303971Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5304363Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5304724Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5304986Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5305346Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5305650Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5305940Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5306234Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5306637Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5306997Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5307453Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5307829Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5308222Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5308590Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5308881Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5309003Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5309380Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5309814Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5309943Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5310059Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5310474Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5310736Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5311096Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5311486Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5311858Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5312141Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5312405Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5312763Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5313046Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5313311Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5313673Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5313963Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5314216Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5314576Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5314867Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5315119Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5315477Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5315736Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5316095Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5316350Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5316708Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5316956Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5317325Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5317715Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5318096Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5318514Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5318876Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5319280Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5319638Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5320060Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5320430Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5320819Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5321221Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5321506Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5321784Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5322153Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5322587Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5322900Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5323181Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5323472Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5323893Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5324261Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5324675Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5325041Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5325300Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5325680Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5326082Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5326448Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5326859Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5327222Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5327580Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5327883Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5328178Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5328450Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5328707Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5329087Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5329494Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5329862Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5330310Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5330680Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5330935Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5331311Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5331741Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5332119Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5332518Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5332914Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5333179Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5333578Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5333986Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5334350Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5334750Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5335124Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5335472Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5335789Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5336075Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5336374Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5336784Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5337146Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5337402Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5337781Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5338181Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5338560Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5338960Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5339328Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5339688Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5339989Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5340289Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5340589Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5340992Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5341370Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5341770Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5342177Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5342577Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5342944Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5343218Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5343615Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5344022Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5344400Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5344835Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5345222Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5345601Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5345907Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5346211Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5346514Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5346930Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5347299Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5347705Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5348087Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5348494Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5348879Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5349280Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5349650Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5350066Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5350432Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5350737Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5351033Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5351300Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5351591Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5351931Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5352250Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5352547Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5352809Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5353085Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5353279Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5353380Z #184 _start from ??:0 2025-12-04T11:31:57.5353509Z #185 from ??:0 2025-12-04T11:31:57.5353517Z 2025-12-04T11:31:57.5353552Z 2025-12-04T11:31:57.5353765Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5354263Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5354283Z 2025-12-04T11:31:57.5354543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5354787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5354956Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5356300Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5356655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5356776Z graph_break [] 2025-12-04T11:31:57.5356984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5357156Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5357448Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5358929Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5359022Z graph_break [] 2025-12-04T11:31:57.5359159Z =================================== FAILURES =================================== 2025-12-04T11:31:57.5359432Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5359551Z Traceback (most recent call last): 2025-12-04T11:31:57.5359964Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5360101Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5360575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5360696Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5361053Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5361143Z def fn(x): 2025-12-04T11:31:57.5361570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5361674Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5362132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5362257Z return compiled_fn(full_args) 2025-12-04T11:31:57.5362953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5363103Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5363699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5363810Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5364369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5364500Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5365051Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5365203Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5365745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5365863Z outs = compiled_fn(args) 2025-12-04T11:31:57.5366307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5366432Z return self.current_callable(inputs) 2025-12-04T11:31:57.5366862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5366967Z out = model(new_inputs) 2025-12-04T11:31:57.5367457Z File "/tmp/tmpzrap4p58/ai/caigvbcolirtxl4f37pdfnsretrfde6fwtpjzyg4qdu2djzyyzek.py", line 227, in call 2025-12-04T11:31:57.5367842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5367952Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5368235Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5368969Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5369075Z C++ CapturedTraceback: 2025-12-04T11:31:57.5370362Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5370828Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5371162Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5371951Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5373215Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5374879Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5381886Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5385340Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5386760Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5387360Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5391595Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5392336Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5393290Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5396938Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5397584Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5398357Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5399171Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5404334Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5404612Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5404946Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5405207Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5405476Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5405843Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5406138Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5406435Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5406729Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5407122Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5407501Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5407894Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5408335Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5408733Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5409091Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5409398Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5409737Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5410045Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5410433Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5410832Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5411240Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5411632Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5411893Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5412252Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5412547Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5412843Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5413133Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5413528Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5413893Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5414288Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5414663Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5415051Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5415409Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5415676Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5415795Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5416166Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5416562Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5416682Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5416807Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5417167Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5417421Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5417786Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5418181Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5418551Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5418834Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5419084Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5419454Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5419765Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5420026Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5420384Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5420664Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5420946Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5421306Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5421589Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5421877Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5422230Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5422517Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5422879Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5423123Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5423501Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5423747Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5424113Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5424509Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5424867Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5425278Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5425639Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5426026Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5426393Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5426783Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5427149Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5427543Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5427900Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5428194Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5428443Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5428812Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5429149Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5429441Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5429736Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5430030Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5430456Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5430827Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5431265Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5431650Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5431907Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5432274Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5432719Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5433088Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5433537Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5433906Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5434285Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5434600Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5434893Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5435176Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5435438Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5435809Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5436230Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5436606Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5437025Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5437395Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5437653Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5438040Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5438447Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5438814Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5439232Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5439603Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5439879Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5440248Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5440651Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5441031Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5441434Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5441819Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5442165Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5442551Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5442857Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5443203Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5443622Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5443989Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5444247Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5444660Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5445064Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5445463Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5445877Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5446277Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5446636Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5446935Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5447224Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5447535Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5447940Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5448327Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5448728Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5449095Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5449507Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5449873Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5450140Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5450509Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5450910Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5451290Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5451690Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5452060Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5452423Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5452726Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5453033Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5453335Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5453737Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5454115Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5454521Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5454946Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5455354Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5455720Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5456133Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5456529Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5456948Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5457322Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5457639Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5457958Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5458253Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5458534Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5458890Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5459209Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5459507Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5459772Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5460038Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5460246Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5460344Z #184 _start from ??:0 2025-12-04T11:31:57.5460463Z #185 from ??:0 2025-12-04T11:31:57.5460487Z 2025-12-04T11:31:57.5460492Z 2025-12-04T11:31:57.5460710Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5461212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5461217Z 2025-12-04T11:31:57.5461494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5461718Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5461877Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5463247Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5463555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5463665Z graph_break [] 2025-12-04T11:31:57.5463882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5464040Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5464349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5465814Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5465924Z graph_break [] 2025-12-04T11:31:57.5466135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5466325Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5466634Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5468128Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5468239Z graph_break [] 2025-12-04T11:31:57.5469016Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml - 2025-12-04T11:31:57.5469236Z =========================== short test summary info ============================ 2025-12-04T11:31:57.5469927Z FAILED [1.3269s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5470694Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5470814Z C++ CapturedTraceback: 2025-12-04T11:31:57.5472085Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5472573Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5472897Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5473687Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5474954Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5476630Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5483767Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5487228Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5488652Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5489254Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5493475Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5494217Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5495179Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5498914Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5499566Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5500352Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5501357Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5506164Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5506446Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5506762Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5507027Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5507299Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5507664Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5507977Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5508263Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5508556Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5508968Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5509333Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5509738Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5510166Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5510558Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5510934Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5511233Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5511558Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5511865Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5512300Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5512668Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5513064Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5513468Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5513732Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5514094Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5514406Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5514689Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5514981Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5515391Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5515752Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5516165Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5516522Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5516919Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5517294Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5517545Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5517666Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5518046Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5518441Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5518572Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5518689Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5519052Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5519321Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5519681Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5520076Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5520446Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5520734Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5520997Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5521391Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5521679Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5521941Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5522304Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5522698Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5522991Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5523357Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5523688Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5523936Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5524316Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5524596Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5524956Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5525221Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5525582Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5525829Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5526209Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5526611Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5526986Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5527386Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5527747Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5528152Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5528512Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5528921Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5529284Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5529681Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5530055Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5530347Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5530599Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5530974Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5531317Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5531632Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5531918Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5532214Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5532643Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5533045Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5533471Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5533846Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5534106Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5534519Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5534928Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5535309Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5535736Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5536106Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5536500Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5536806Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5537096Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5537378Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5537637Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5538022Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5538426Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5538794Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5539209Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5539577Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5539850Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5540215Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5540617Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5540997Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5541399Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5541780Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5542039Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5542408Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5542818Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5543182Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5543583Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5543962Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5544308Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5544624Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5544945Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5545248Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5545666Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5546033Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5546331Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5546699Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5547099Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5547528Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5547933Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5548345Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5548690Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5548991Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5549301Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5549601Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5550006Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5550390Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5550792Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5551176Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5551577Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5551945Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5552216Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5552589Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5552999Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5553367Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5553765Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5554149Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5554494Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5554805Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5555094Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5555393Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5555806Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5556177Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5556576Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5556985Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5557391Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5557769Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5558172Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5558569Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5558985Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5559382Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5559680Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5559981Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5560277Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5560565Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5560909Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5561228Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5561522Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5561790Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5562068Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5562262Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5562360Z #184 _start from ??:0 2025-12-04T11:31:57.5562582Z #185 from ??:0 2025-12-04T11:31:57.5562590Z 2025-12-04T11:31:57.5562594Z 2025-12-04T11:31:57.5562811Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5563333Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5563339Z 2025-12-04T11:31:57.5563601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5563782Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.5563992Z ================== 1 failed, 31 deselected, 2 rerun in 22.80s ================== 2025-12-04T11:31:57.5564091Z Got exit code 1 2025-12-04T11:31:57.5564195Z Retrying single test... 2025-12-04T11:31:57.5564812Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml 2025-12-04T11:31:57.5564971Z ============================= test session starts ============================== 2025-12-04T11:31:57.5565334Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.5565443Z cachedir: .pytest_cache 2025-12-04T11:31:57.5565946Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.5566081Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.5566186Z configfile: pytest.ini 2025-12-04T11:31:57.5566772Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.5566984Z collecting ... collected 32 items / 31 deselected / 1 selected 2025-12-04T11:31:57.5567567Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda 2025-12-04T11:31:57.5567693Z Running 1 items in this shard 2025-12-04T11:31:57.5567741Z 2025-12-04T11:31:57.5568595Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:08.963280782 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor) 2025-12-04T11:31:57.5569119Z [W1204 11:31:08.963304307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5569125Z 2025-12-04T11:31:57.5569284Z ('RERUN', {'yellow': True}) [20.3320s] [100%] 2025-12-04T11:31:57.5570177Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:25.144290228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5570213Z 2025-12-04T11:31:57.5570351Z ('RERUN', {'yellow': True}) [1.3565s] [100%] 2025-12-04T11:31:57.5571242Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:26.435756544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:31:57.5571276Z 2025-12-04T11:31:57.5571389Z FAILED [1.2890s] [100%] 2025-12-04T11:31:57.5571394Z 2025-12-04T11:31:57.5571535Z ==================================== RERUNS ==================================== 2025-12-04T11:31:57.5571796Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5571928Z Traceback (most recent call last): 2025-12-04T11:31:57.5572341Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5572479Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5572962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5573071Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5573445Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5573541Z def fn(x): 2025-12-04T11:31:57.5573960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5574077Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5574538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5574666Z return compiled_fn(full_args) 2025-12-04T11:31:57.5575257Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5575397Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5576011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5576125Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5576700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5576835Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5577379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5577509Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5578063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5578173Z outs = compiled_fn(args) 2025-12-04T11:31:57.5578640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5578770Z return self.current_callable(inputs) 2025-12-04T11:31:57.5579180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5579287Z out = model(new_inputs) 2025-12-04T11:31:57.5579796Z File "/tmp/tmpvh293qd0/z7/cz75itwfjcnm4yvpxo35zryxuqyb7drx2ljgdlwurbj2o2ooh7ar.py", line 227, in call 2025-12-04T11:31:57.5580166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5580281Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5580521Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5581295Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5581406Z C++ CapturedTraceback: 2025-12-04T11:31:57.5582691Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5583227Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5583566Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5584357Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5585608Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5587300Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5594276Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5597753Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5599189Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5599791Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5604359Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5605094Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5606068Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5609750Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5610403Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5611178Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5612018Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5616918Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5617251Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5617586Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5617854Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5618111Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5618501Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5618804Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5619094Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5619401Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5619803Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5620184Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5620582Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5620944Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5621360Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5621724Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5622036Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5622328Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5622623Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5623072Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5623437Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5623844Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5624239Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5624496Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5624874Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5625199Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5625483Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5625794Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5626223Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5626607Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5627000Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5627360Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5627766Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5628128Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5628399Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5628519Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5628883Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5629294Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5629412Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5629525Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5629900Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5630151Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5630522Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5630918Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5631278Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5631578Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5631829Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5632201Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5632484Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5632734Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5633106Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5633386Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5633639Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5634013Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5634331Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5634597Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5634961Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5635209Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5635610Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5635861Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5636233Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5636511Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5636877Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5637317Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5637677Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5638072Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5638449Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5638843Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5639214Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5639611Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5639975Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5640380Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5640739Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5641035Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5641288Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5641650Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5642003Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5642302Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5642702Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5643001Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5643409Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5643790Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5644192Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5644564Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5644838Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5645209Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5645626Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5646051Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5646455Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5646838Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5647188Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5647535Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5647829Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5648096Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5648396Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5648768Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5649216Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5649584Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5649987Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5650366Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5650625Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5650994Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5651411Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5651777Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5652193Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5652560Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5652817Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5653200Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5653603Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5653980Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5654385Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5654750Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5655112Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5655415Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5655721Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5656023Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5656425Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5656803Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5657061Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5657429Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5657877Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5658246Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5658660Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5659025Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5659399Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5659716Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5660043Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5660354Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5660758Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5661156Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5661568Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5661934Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5662350Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5662713Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5662969Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5663350Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5663751Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5664120Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5664531Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5664898Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5665259Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5665559Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5665849Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5666161Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5666560Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5666944Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5667347Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5667714Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5668129Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5668495Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5668950Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5669317Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5669754Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5670139Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5670423Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5670724Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5671000Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5671316Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5671678Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5672030Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5672318Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5672603Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5672900Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5673107Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5673207Z #184 _start from ??:0 2025-12-04T11:31:57.5673325Z #185 from ??:0 2025-12-04T11:31:57.5673331Z 2025-12-04T11:31:57.5673336Z 2025-12-04T11:31:57.5673566Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5674070Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5674076Z 2025-12-04T11:31:57.5674337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5674572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5674733Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5676113Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5676420Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5676532Z graph_break [] 2025-12-04T11:31:57.5676792Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5676913Z Traceback (most recent call last): 2025-12-04T11:31:57.5677340Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5677470Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5677952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5678080Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5678442Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5678535Z def fn(x): 2025-12-04T11:31:57.5678966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5679074Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5679549Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5679664Z return compiled_fn(full_args) 2025-12-04T11:31:57.5680252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5680405Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5681045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5681181Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5681736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5681869Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5682574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5682697Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5683243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5683399Z outs = compiled_fn(args) 2025-12-04T11:31:57.5683850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5683990Z return self.current_callable(inputs) 2025-12-04T11:31:57.5684520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5684628Z out = model(new_inputs) 2025-12-04T11:31:57.5685117Z File "/tmp/tmp0q8t7vtj/37/c37ypiaxo5cnfzthmtsb4kk4r2dlmjwvcr4olm7aszoktcoqoufn.py", line 227, in call 2025-12-04T11:31:57.5685476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5685595Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5685851Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5686595Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5686721Z C++ CapturedTraceback: 2025-12-04T11:31:57.5687997Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5688489Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5688819Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5689610Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5690886Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5692562Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5699570Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5703229Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5704612Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5705215Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5709418Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5710239Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5711202Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5714842Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5715527Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5716262Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5717084Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5721920Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5722202Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5722590Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5722856Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5723127Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5723496Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5723809Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5724094Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5724433Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5724847Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5725209Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5725609Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5726017Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5726414Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5726821Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5727118Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5727407Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5727745Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5728142Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5728519Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5728916Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5729277Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5729546Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5729906Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5730205Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5730506Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5730801Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5731213Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5731577Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5731974Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5732346Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5732744Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5733118Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5733377Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5733498Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5733871Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5734262Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5734384Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5734510Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5734876Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5735143Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5735504Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5735927Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5736301Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5736585Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5736847Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5737236Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5737522Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5737784Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5738189Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5738472Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5738736Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5739141Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5739435Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5739684Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5740044Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5740306Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5740669Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5740933Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5741295Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5741549Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5741924Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5742324Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5742685Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5743096Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5743456Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5743867Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5744228Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5744621Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5744995Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5745387Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5745758Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5746041Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5746291Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5746663Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5747004Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5747345Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5747632Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5747925Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5748347Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5748745Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5749152Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5749532Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5749820Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5750199Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5750640Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5751010Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5751422Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5751789Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5752152Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5752452Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5752744Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5753023Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5753281Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5753664Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5754064Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5754432Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5754847Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5755213Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5755472Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5755849Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5756250Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5756636Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5757038Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5757405Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5757676Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5758043Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5758456Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5758829Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5759261Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5759644Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5759989Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5760305Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5760633Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5760934Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5761351Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5761752Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5762010Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5762534Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5762943Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5763324Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5763725Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5764092Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5764452Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5764755Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5765061Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5765364Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5765765Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5766147Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5766545Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5766928Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5767327Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5767697Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5767966Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5768334Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5768737Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5769117Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5769517Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5769899Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5770244Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5770565Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5770869Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5771209Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5771629Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5771998Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5772406Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5772818Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5773222Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5773636Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5774035Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5774407Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5774856Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5775221Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5775508Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5775826Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5776091Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5776386Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5776734Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5777053Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5777356Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5777624Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5777900Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5778092Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5778194Z #184 _start from ??:0 2025-12-04T11:31:57.5778335Z #185 from ??:0 2025-12-04T11:31:57.5778340Z 2025-12-04T11:31:57.5778345Z 2025-12-04T11:31:57.5778562Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5779061Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5779084Z 2025-12-04T11:31:57.5779346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5779567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5779743Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5781102Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5781425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5781526Z graph_break [] 2025-12-04T11:31:57.5781740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5781916Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5782212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5783709Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5783820Z graph_break [] 2025-12-04T11:31:57.5784004Z =================================== FAILURES =================================== 2025-12-04T11:31:57.5784273Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________ 2025-12-04T11:31:57.5784395Z Traceback (most recent call last): 2025-12-04T11:31:57.5784848Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa 2025-12-04T11:31:57.5784995Z torch.compile(fn, fullgraph=True)(x) 2025-12-04T11:31:57.5785483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper 2025-12-04T11:31:57.5785641Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5786008Z File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn 2025-12-04T11:31:57.5786100Z def fn(x): 2025-12-04T11:31:57.5786532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn 2025-12-04T11:31:57.5786643Z return fn(*args, **kwargs) 2025-12-04T11:31:57.5787104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward 2025-12-04T11:31:57.5787231Z return compiled_fn(full_args) 2025-12-04T11:31:57.5787819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper 2025-12-04T11:31:57.5787975Z all_outs = call_func_at_runtime_with_args( 2025-12-04T11:31:57.5788573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args 2025-12-04T11:31:57.5788694Z out = normalize_as_list(f(args)) 2025-12-04T11:31:57.5789260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__ 2025-12-04T11:31:57.5789391Z return self.compiled_fn(*args, **kwargs) 2025-12-04T11:31:57.5789937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper 2025-12-04T11:31:57.5790066Z return compiled_fn(runtime_args) 2025-12-04T11:31:57.5790614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn 2025-12-04T11:31:57.5790738Z outs = compiled_fn(args) 2025-12-04T11:31:57.5791187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__ 2025-12-04T11:31:57.5791311Z return self.current_callable(inputs) 2025-12-04T11:31:57.5791724Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run 2025-12-04T11:31:57.5791831Z out = model(new_inputs) 2025-12-04T11:31:57.5792326Z File "/tmp/tmp5ecai3s6/jl/cjlyyplawpcwhafdzduzmiv34giqznfuxr7m7doagkglbwv2n7uy.py", line 227, in call 2025-12-04T11:31:57.5792685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:31:57.5792801Z return self._op(*args, **kwargs) 2025-12-04T11:31:57.5793055Z RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5793787Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5793898Z C++ CapturedTraceback: 2025-12-04T11:31:57.5795219Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5795693Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5796029Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5796845Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5798141Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5799855Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5807171Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5810667Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5812088Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5812689Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5816947Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5817715Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5818687Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5822279Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5822905Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5823646Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5824466Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5829341Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5829664Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5829995Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5830262Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5830529Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5830894Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5831197Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5831502Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5831800Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5832200Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5832576Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5832973Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5833347Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5833742Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5834102Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5834411Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5834698Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5835003Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5835396Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5835754Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5836165Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5836529Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5836782Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5837155Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5837497Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5837797Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5838091Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5838488Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5838891Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5839285Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5839661Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5840084Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5840445Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5840737Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5840858Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5841231Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5841625Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5841743Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5841874Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5842235Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5842546Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5842923Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5843320Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5843697Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5843986Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5844236Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5844610Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5844894Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5845141Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5845517Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5845798Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5846062Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5846422Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5846701Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5846962Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5847324Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5847585Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5847946Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5848196Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5848567Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5848862Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5849225Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5849630Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5849990Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5850427Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5850788Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5851212Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5851588Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5852016Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5852392Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5852788Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5853145Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5853443Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5853694Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5854067Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5854405Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5854705Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5855008Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5855302Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5855708Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5856090Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5856490Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5862410Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5862757Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5863143Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5863582Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5863957Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5864377Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5864749Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5865097Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5865417Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5865712Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5865981Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5866342Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5866717Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5867137Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5867506Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5867947Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5868331Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5868627Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5869004Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5869408Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5869820Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5870235Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5870601Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5870862Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5871240Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5871641Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5872027Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5872419Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5872785Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5873151Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5873453Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5873747Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5874059Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5874461Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5874844Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5875101Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5875472Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5875884Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5876249Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5876664Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5877031Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5877376Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5877693Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5877983Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5878333Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5878736Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5879102Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5879555Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5879927Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5880328Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5880737Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5880994Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5881427Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5881830Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5882193Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5882709Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5883076Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5883435Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5883737Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5884028Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5884341Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5884745Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5885128Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5885530Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5885899Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5886314Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5886683Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5887081Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5887465Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5887867Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5888250Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5888537Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5888841Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5889119Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5889398Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5889757Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5890077Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5890397Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5890675Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5890937Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5891131Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5891245Z #184 _start from ??:0 2025-12-04T11:31:57.5891393Z #185 from ??:0 2025-12-04T11:31:57.5891402Z 2025-12-04T11:31:57.5891407Z 2025-12-04T11:31:57.5891635Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5892239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5892244Z 2025-12-04T11:31:57.5892506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5892774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5892941Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5894328Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5894632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5894730Z graph_break [] 2025-12-04T11:31:57.5894961Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5895118Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5895430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5896906Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5897002Z graph_break [] 2025-12-04T11:31:57.5897228Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:31:57.5897382Z stats [('calls_captured', 16), ('unique_graphs', 1)] 2025-12-04T11:31:57.5897689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:31:57.5899152Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:31:57.5899263Z graph_break [] 2025-12-04T11:31:57.5900038Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml - 2025-12-04T11:31:57.5900206Z =========================== short test summary info ============================ 2025-12-04T11:31:57.5901101Z FAILED [1.2890s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer. 2025-12-04T11:31:57.5901836Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first): 2025-12-04T11:31:57.5901948Z C++ CapturedTraceback: 2025-12-04T11:31:57.5903318Z #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:31:57.5903795Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:31:57.5904179Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0 2025-12-04T11:31:57.5904966Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional&, std::optional&, float, float, bool, int, int, float, bool, std::optional) from ??:0 2025-12-04T11:31:57.5906282Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, long, long, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5908007Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5914972Z #10 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple, c10::guts::typelist::typelist const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&> >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5918433Z #11 std::tuple c10::callUnboxedKernelFunction, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional&&, std::optional&&, std::optional&&, std::optional const&, std::optional const&) [clone .isra.0] from Operators_0.cpp:0 2025-12-04T11:31:57.5919841Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional const&, std::optional const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional, std::optional, std::optional, std::optional const&, std::optional const&) from ??:0 2025-12-04T11:31:57.5920949Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5925264Z #14 c10::impl::wrap_kernel_functor_unboxed_ (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, std::tuple (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from RegisterCUDA_0.cpp:0 2025-12-04T11:31:57.5926046Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from ??:0 2025-12-04T11:31:57.5926997Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional) from VariableType_1.cpp:0 2025-12-04T11:31:57.5930597Z #17 c10::impl::make_boxed_from_unboxed_functor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple, c10::guts::typelist::typelist > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector >*) from VariableType_1.cpp:0 2025-12-04T11:31:57.5931220Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector >*) const [clone .isra.0] from register_c10_ops.cpp:0 2025-12-04T11:31:57.5931957Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef >, pybind11::args const&, pybind11::kwargs const&, std::optional) from :0 2025-12-04T11:31:57.5932805Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional) from :0 2025-12-04T11:31:57.5937680Z #21 pybind11::cpp_function::initialize, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&)#218}::operator()(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0 2025-12-04T11:31:57.5938014Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0 2025-12-04T11:31:57.5938343Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543 2025-12-04T11:31:57.5938609Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5938876Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917 2025-12-04T11:31:57.5939245Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5939547Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5939853Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5940150Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5940563Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5940927Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5941326Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5941701Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5942095Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5942459Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5942775Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5943060Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5943366Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5943759Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5944120Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5944532Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5944892Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5945158Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5945520Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5945859Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5946155Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5946450Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5946843Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5947242Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5947639Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5948040Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5948433Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5948793Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5949086Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5949207Z #55 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5949579Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5949974Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5950095Z #58 dynamo_eval_custom_code from ??:0 2025-12-04T11:31:57.5950223Z #59 dynamo__custom_eval_frame from :0 2025-12-04T11:31:57.5950585Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5950840Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5951215Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5951613Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5951991Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5952277Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5952527Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5952901Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5953186Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5953451Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5953812Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5954098Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5954362Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5954722Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5955002Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5955264Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5955626Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5955888Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5956249Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5956497Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5956904Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5957160Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5957531Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5957930Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5958317Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5958728Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5959090Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5959530Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5959896Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5960316Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5960689Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5961085Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5961445Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5961744Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267 2025-12-04T11:31:57.5961996Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5962373Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5962791Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5963097Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5963396Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5963689Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5964113Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5964485Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5964888Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5965275Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5965536Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5965918Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5966324Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5966692Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5967106Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5967475Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5967826Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5968143Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5968434Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5968752Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305 2025-12-04T11:31:57.5969014Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5969383Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5969799Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5970193Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5970608Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5970976Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5971263Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5971648Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5972082Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5972452Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5972871Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5973245Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5973515Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5973884Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5974288Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5974667Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5975071Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5975451Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5975798Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5976101Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5976405Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5976705Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5977125Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5977490Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5977751Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5978131Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5978533Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5978901Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5979315Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5979684Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5980045Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5980345Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5980665Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5980978Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5981381Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5981760Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5982209Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5982578Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5982990Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5983388Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5983661Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945 2025-12-04T11:31:57.5984056Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5984459Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5984839Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5985243Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5985613Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5985972Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153 2025-12-04T11:31:57.5986278Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431 2025-12-04T11:31:57.5986582Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494 2025-12-04T11:31:57.5986884Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215 2025-12-04T11:31:57.5987285Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112 2025-12-04T11:31:57.5987666Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5988068Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5988447Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5988847Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5989214Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5989626Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5989998Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5990410Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114 2025-12-04T11:31:57.5990779Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46 2025-12-04T11:31:57.5991062Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134 2025-12-04T11:31:57.5991371Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291 2025-12-04T11:31:57.5991632Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312 2025-12-04T11:31:57.5991912Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208 2025-12-04T11:31:57.5992270Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456 2025-12-04T11:31:57.5992618Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90 2025-12-04T11:31:57.5992917Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357 2025-12-04T11:31:57.5993181Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090 2025-12-04T11:31:57.5993438Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58 2025-12-04T11:31:57.5993640Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392 2025-12-04T11:31:57.5993767Z #184 _start from ??:0 2025-12-04T11:31:57.5993886Z #185 from ??:0 2025-12-04T11:31:57.5993905Z 2025-12-04T11:31:57.5993909Z 2025-12-04T11:31:57.5994123Z To execute this test, run the following from the base repo dir: 2025-12-04T11:31:57.5994657Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda 2025-12-04T11:31:57.5994662Z 2025-12-04T11:31:57.5994937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:31:57.5995142Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:31:57.5995349Z ================== 1 failed, 31 deselected, 2 rerun in 23.01s ================== 2025-12-04T11:31:57.5995444Z Got exit code 1 2025-12-04T11:31:57.5995864Z FAILED CONSISTENTLY: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda 2025-12-04T11:31:57.5996274Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:31:57.5996867Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml 2025-12-04T11:31:57.5997029Z ============================= test session starts ============================== 2025-12-04T11:31:57.5997389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T11:31:57.5997494Z cachedir: .pytest_cache 2025-12-04T11:31:57.5998016Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:31:57.5998135Z rootdir: /var/lib/jenkins/workspace 2025-12-04T11:31:57.5998238Z configfile: pytest.ini 2025-12-04T11:31:57.5998826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T11:31:57.5999043Z collecting ... collected 32 items / 18 deselected / 14 selected 2025-12-04T11:31:57.5999183Z stepcurrent: skipping 18 already run items. 2025-12-04T11:31:57.5999307Z Running 14 items in this shard 2025-12-04T11:31:57.5999312Z 2025-12-04T11:31:57.5999728Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_softmax_cuda PASSED [4.2993s] [ 7%] 2025-12-04T11:31:57.6000188Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_split_with_sizes_cuda PASSED [0.5554s] [ 14%] 2025-12-04T11:31:57.6000676Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_to_int_with_unbacked_size_cuda PASSED [0.4729s] [ 21%] 2025-12-04T11:31:57.6001386Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_grid_cuda PASSED [1.1378s] [ 28%] 2025-12-04T11:31:57.6001978Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_with_unbacked_symint_fallback_cuda PASSED [0.7579s] [ 35%] 2025-12-04T11:31:57.6003026Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_linear_layer_norm_input_cuda W1204 11:31:48.891000 105516 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode 2025-12-04T11:31:57.6003144Z PASSED [4.6323s] [ 42%] 2025-12-04T11:31:57.6003630Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_masked_scatter_cuda PASSED [0.6073s] [ 50%] 2025-12-04T11:31:57.6004134Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_range_tree_divisor_cuda PASSED [0.3520s] [ 57%] 2025-12-04T11:31:57.6004682Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_repeat_cuda PASSED [0.4206s] [ 64%] 2025-12-04T11:31:57.6005218Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic2_cuda PASSED [0.6099s] [ 71%] 2025-12-04T11:31:57.6005789Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_False_cuda PASSED [0.2889s] [ 78%] 2025-12-04T11:31:57.6006381Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_True_cuda PASSED [0.4219s] [ 85%] 2025-12-04T11:31:57.6006923Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_vertical_pointwise_reduction_fusion_cuda PASSED [0.7337s] [ 92%] 2025-12-04T11:31:57.6007417Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_view_of_slice_cuda PASSED [0.4058s] [100%] 2025-12-04T11:31:57.6007423Z 2025-12-04T11:31:57.6008208Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml - 2025-12-04T11:31:57.6008442Z ====================== 14 passed, 18 deselected in 15.74s ====================== 2025-12-04T11:31:57.6009460Z The following tests failed consistently: ['test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda', 'test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda'] 2025-12-04T11:31:57.6009469Z 2025-12-04T11:31:57.6010058Z FINISHED PRINTING LOG FILE of inductor/test_unbacked_symints 1/1 (test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log) 2025-12-04T11:31:57.6010064Z 2025-12-04T11:31:57.6010430Z Finished inductor/test_unbacked_symints 1/1 ... [2025-12-04 11:31:57.322331][8274.932232816], took 3.65min 2025-12-04T11:31:57.6011254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml 2025-12-04T11:31:57.6012143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml 2025-12-04T11:31:57.6012958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml 2025-12-04T11:31:57.6013785Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml 2025-12-04T11:31:57.6014597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml 2025-12-04T11:31:57.6015434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml 2025-12-04T11:31:57.6016252Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml 2025-12-04T11:31:57.9557228Z Uploading logs for 57119749427 to S3 2025-12-04T11:31:58.0481954Z Uploading artifacts took 0.44 seconds 2025-12-04T11:31:58.0482394Z inductor/test_unbacked_symints 1/1 failed! 2025-12-04T11:31:58.0486278Z Running inductor/test_scatter_optimization 1/1 ... [2025-12-04 11:31:58.048443][8275.658350637] 2025-12-04T11:31:58.0486894Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:31:58.0491043Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:58.048877] 2025-12-04T11:32:19.2447604Z 2025-12-04T11:32:19.2448949Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_7430a249406bb12a_.log 2025-12-04T11:32:19.2453360Z Running 8 items in this shard: test/inductor/test_scatter_optimization.py::TestScatterOpt::test_3d_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_dense, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_non_const, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_cross_entropy_loss, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_neg_scatter_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_non_last_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_nonzero_const_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_shorter_index_tensor 2025-12-04T11:32:19.2456959Z 2025-12-04T11:32:19.2457363Z Finished inductor/test_scatter_optimization 1/1 ... [2025-12-04 11:32:19.244555][8296.854464521], took 0.35min 2025-12-04T11:32:19.2559514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.xml 2025-12-04T11:32:19.3293368Z Running inductor/test_mix_order_reduction 1/2 ... [2025-12-04 11:32:19.329016][8296.938922828] 2025-12-04T11:32:19.3293967Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:32:19.3297047Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mix_order_reduction.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:32:19.329468] 2025-12-04T12:12:57.4238981Z 2025-12-04T12:12:57.4240251Z PRINTING LOG FILE of inductor/test_mix_order_reduction 1/2 (test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log) 2025-12-04T12:12:57.4242300Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml 2025-12-04T12:12:57.4243469Z ============================= test session starts ============================== 2025-12-04T12:12:57.4244368Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.4245238Z cachedir: .pytest_cache 2025-12-04T12:12:57.4246251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.4247363Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.4247894Z configfile: pytest.ini 2025-12-04T12:12:57.4249299Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.4250174Z collecting ... collected 380 items 2025-12-04T12:12:57.4250575Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T12:12:57.4451920Z Running 175 items in this shard: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_multi_workspace_allocation, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_non_contiguous_input, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_3layer_split_reduction, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_XBLOCK_coordest_tuning, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_multi_workspace_allocation, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_non_contiguous_input, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims1 2025-12-04T12:12:57.4585009Z 2025-12-04T12:12:57.4585660Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape0 PASSED [5.5904s] [ 0%] 2025-12-04T12:12:57.4586999Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape1 PASSED [1.0034s] [ 1%] 2025-12-04T12:12:57.4588356Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0 PASSED [1.0544s] [ 1%] 2025-12-04T12:12:57.4589735Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape1 PASSED [1.4576s] [ 2%] 2025-12-04T12:12:57.4591126Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0 PASSED [1.4458s] [ 2%] 2025-12-04T12:12:57.4592505Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape1 PASSED [1.4676s] [ 3%] 2025-12-04T12:12:57.4593828Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2 PASSED [2.5870s] [ 4%] 2025-12-04T12:12:57.4595344Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0 PASSED [0.8181s] [ 4%] 2025-12-04T12:12:57.4596819Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0 PASSED [0.8190s] [ 5%] 2025-12-04T12:12:57.4598289Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape1 PASSED [1.0635s] [ 5%] 2025-12-04T12:12:57.4599833Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_False_shape0 PASSED [0.7973s] [ 6%] 2025-12-04T12:12:57.4601456Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape2 PASSED [0.5962s] [ 6%] 2025-12-04T12:12:57.4603031Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape0 PASSED [1.4498s] [ 7%] 2025-12-04T12:12:57.4605556Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [ 8%] 2025-12-04T12:12:57.4607139Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape0 PASSED [0.4343s] [ 8%] 2025-12-04T12:12:57.4608707Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape2 SKIPPED [0.0030s] (Invalid combination) [ 9%] 2025-12-04T12:12:57.4610262Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape1 PASSED [0.4336s] [ 9%] 2025-12-04T12:12:57.4611832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [ 10%] 2025-12-04T12:12:57.4613411Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0 PASSED [0.4593s] [ 10%] 2025-12-04T12:12:57.4614871Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape1 PASSED [0.4727s] [ 11%] 2025-12-04T12:12:57.4616320Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape0 PASSED [0.4698s] [ 12%] 2025-12-04T12:12:57.4617753Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2 PASSED [0.4813s] [ 12%] 2025-12-04T12:12:57.4619194Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0 PASSED [0.4735s] [ 13%] 2025-12-04T12:12:57.4620463Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_multi_workspace_allocation PASSED [0.5243s] [ 13%] 2025-12-04T12:12:57.4621516Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_non_contiguous_input PASSED [0.4761s] [ 14%] 2025-12-04T12:12:57.4622986Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.2257s] [ 14%] 2025-12-04T12:12:57.4624915Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [ 14%] 2025-12-04T12:12:57.4626757Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1551s] [ 14%] 2025-12-04T12:12:57.4627721Z 2025-12-04T12:12:57.4627917Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.4628760Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4629557Z Traceback (most recent call last): 2025-12-04T12:12:57.4630258Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4631054Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4631636Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4631972Z 2025-12-04T12:12:57.4632183Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4633495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4634556Z 2025-12-04T12:12:57.4634834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4635498Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4635958Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4636296Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4636725Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4637398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4637967Z graph_break [] 2025-12-04T12:12:57.4638333Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4639407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4640345Z warnings.warn( 2025-12-04T12:12:57.4641058Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4641865Z Traceback (most recent call last): 2025-12-04T12:12:57.4642801Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4643618Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4644171Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4644512Z 2025-12-04T12:12:57.4644746Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4646045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4647148Z 2025-12-04T12:12:57.4647419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4648052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4648531Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4648856Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4649294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4649997Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4650568Z graph_break [] 2025-12-04T12:12:57.4650942Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4652040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4653024Z warnings.warn( 2025-12-04T12:12:57.4653392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4653866Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4654206Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4654686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4655503Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4656070Z graph_break [] 2025-12-04T12:12:57.4656438Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4657531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4658479Z warnings.warn( 2025-12-04T12:12:57.4658784Z =================================== FAILURES =================================== 2025-12-04T12:12:57.4659644Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4660453Z Traceback (most recent call last): 2025-12-04T12:12:57.4661147Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4661969Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4662488Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4662830Z 2025-12-04T12:12:57.4663041Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4664313Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4665364Z 2025-12-04T12:12:57.4665635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4666237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4666700Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4667028Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4667445Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4668135Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4668701Z graph_break [] 2025-12-04T12:12:57.4669067Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4670124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4671072Z warnings.warn( 2025-12-04T12:12:57.4671442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4671891Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4672219Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4672644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4673322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4673881Z graph_break [] 2025-12-04T12:12:57.4674242Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4675319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4676265Z warnings.warn( 2025-12-04T12:12:57.4676628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4677092Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4677426Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4677842Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4678527Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4679101Z graph_break [] 2025-12-04T12:12:57.4679471Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4680570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4681524Z warnings.warn( 2025-12-04T12:12:57.4682559Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml - 2025-12-04T12:12:57.4683696Z =========================== short test summary info ============================ 2025-12-04T12:12:57.4685076Z FAILED [0.1551s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4686335Z 2025-12-04T12:12:57.4686550Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4687829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4688919Z 2025-12-04T12:12:57.4689196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4689776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.4690289Z ============== 1 failed, 22 passed, 3 skipped, 2 rerun in 25.02s =============== 2025-12-04T12:12:57.4690730Z Got exit code 1 2025-12-04T12:12:57.4690996Z Retrying single test... 2025-12-04T12:12:57.4691796Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml 2025-12-04T12:12:57.4695653Z ============================= test session starts ============================== 2025-12-04T12:12:57.4696307Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.4696904Z cachedir: .pytest_cache 2025-12-04T12:12:57.4697583Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.4698352Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.4698698Z configfile: pytest.ini 2025-12-04T12:12:57.4699450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.4700389Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.4701999Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4703272Z Running 1 items in this shard 2025-12-04T12:12:57.4703484Z 2025-12-04T12:12:57.4704379Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [100%] 2025-12-04T12:12:57.4706304Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1644s] [100%] 2025-12-04T12:12:57.4708146Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1590s] [100%] 2025-12-04T12:12:57.4709101Z 2025-12-04T12:12:57.4709240Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.4710071Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4710867Z Traceback (most recent call last): 2025-12-04T12:12:57.4711688Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4712484Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4713017Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4713349Z 2025-12-04T12:12:57.4713557Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4714879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4715951Z 2025-12-04T12:12:57.4716256Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4716879Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4717339Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4717725Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4718272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4718948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4719400Z graph_break [] 2025-12-04T12:12:57.4719768Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4720842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4721778Z warnings.warn( 2025-12-04T12:12:57.4722553Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4723371Z Traceback (most recent call last): 2025-12-04T12:12:57.4724054Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4724854Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4725386Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4725717Z 2025-12-04T12:12:57.4725942Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4727204Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4728265Z 2025-12-04T12:12:57.4728524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4729137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4729601Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4729915Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4730458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4731139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4731571Z graph_break [] 2025-12-04T12:12:57.4731934Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4733003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4733955Z warnings.warn( 2025-12-04T12:12:57.4734311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4734773Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4735101Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4735518Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4736197Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4736762Z graph_break [] 2025-12-04T12:12:57.4737180Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4738234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4739179Z warnings.warn( 2025-12-04T12:12:57.4739490Z =================================== FAILURES =================================== 2025-12-04T12:12:57.4740351Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4741160Z Traceback (most recent call last): 2025-12-04T12:12:57.4741854Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4742679Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4743203Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4743583Z 2025-12-04T12:12:57.4743795Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4745060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4746115Z 2025-12-04T12:12:57.4746385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4746990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4747453Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4747789Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4748325Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4749010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4749465Z graph_break [] 2025-12-04T12:12:57.4749833Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4750891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4751846Z warnings.warn( 2025-12-04T12:12:57.4752224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4752676Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4753012Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4753441Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4754126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4754686Z graph_break [] 2025-12-04T12:12:57.4755059Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4756129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4757065Z warnings.warn( 2025-12-04T12:12:57.4757439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4757901Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4758229Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4758644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4759324Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4759888Z graph_break [] 2025-12-04T12:12:57.4760240Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4761306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4762316Z warnings.warn( 2025-12-04T12:12:57.4763322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml - 2025-12-04T12:12:57.4764411Z =========================== short test summary info ============================ 2025-12-04T12:12:57.4765814Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4767016Z 2025-12-04T12:12:57.4767227Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4768505Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4769595Z 2025-12-04T12:12:57.4769870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4770466Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.4770978Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.4771418Z Got exit code 1 2025-12-04T12:12:57.4771670Z Retrying single test... 2025-12-04T12:12:57.4772489Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml 2025-12-04T12:12:57.4773417Z ============================= test session starts ============================== 2025-12-04T12:12:57.4774065Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.4774644Z cachedir: .pytest_cache 2025-12-04T12:12:57.4775344Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.4776116Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.4776448Z configfile: pytest.ini 2025-12-04T12:12:57.4777207Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.4778136Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.4779495Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4780728Z Running 1 items in this shard 2025-12-04T12:12:57.4780945Z 2025-12-04T12:12:57.4781846Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5663s] [100%] 2025-12-04T12:12:57.4783775Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%] 2025-12-04T12:12:57.4785615Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1590s] [100%] 2025-12-04T12:12:57.4786556Z 2025-12-04T12:12:57.4786709Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.4787528Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4788338Z Traceback (most recent call last): 2025-12-04T12:12:57.4789037Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4789828Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4790380Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4790725Z 2025-12-04T12:12:57.4790933Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4792206Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4793254Z 2025-12-04T12:12:57.4793561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4794161Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4794624Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4794984Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4795514Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4796199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4796702Z graph_break [] 2025-12-04T12:12:57.4797067Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4798130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4799079Z warnings.warn( 2025-12-04T12:12:57.4799795Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4800589Z Traceback (most recent call last): 2025-12-04T12:12:57.4801494Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4802349Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4802885Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4803215Z 2025-12-04T12:12:57.4803426Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4804700Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4805767Z 2025-12-04T12:12:57.4806025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4806639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4807091Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4807422Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4807966Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4808642Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4809091Z graph_break [] 2025-12-04T12:12:57.4809459Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4810543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4811482Z warnings.warn( 2025-12-04T12:12:57.4811860Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4812329Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4812647Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4813077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4813751Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4814318Z graph_break [] 2025-12-04T12:12:57.4814667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4815832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4816788Z warnings.warn( 2025-12-04T12:12:57.4817082Z =================================== FAILURES =================================== 2025-12-04T12:12:57.4817922Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.4818742Z Traceback (most recent call last): 2025-12-04T12:12:57.4819488Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4820273Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4820813Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4821191Z 2025-12-04T12:12:57.4821421Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4822698Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4823796Z 2025-12-04T12:12:57.4824057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4824674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4825141Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4825463Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4826016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4826707Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4827159Z graph_break [] 2025-12-04T12:12:57.4827515Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4828592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4829541Z warnings.warn( 2025-12-04T12:12:57.4829909Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4830382Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4830719Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4831145Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4831810Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4832373Z graph_break [] 2025-12-04T12:12:57.4832734Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4833779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4834727Z warnings.warn( 2025-12-04T12:12:57.4835101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4835571Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4835888Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4836320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4837001Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4837557Z graph_break [] 2025-12-04T12:12:57.4837932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4838993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4839945Z warnings.warn( 2025-12-04T12:12:57.4840895Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml - 2025-12-04T12:12:57.4842052Z =========================== short test summary info ============================ 2025-12-04T12:12:57.4843498Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4844679Z 2025-12-04T12:12:57.4844906Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4846203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4847269Z 2025-12-04T12:12:57.4847566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4848148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.4848662Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.4849120Z Got exit code 1 2025-12-04T12:12:57.4850126Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.4851509Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.4852677Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml 2025-12-04T12:12:57.4853583Z ============================= test session starts ============================== 2025-12-04T12:12:57.4854230Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.4854815Z cachedir: .pytest_cache 2025-12-04T12:12:57.4855516Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.4856274Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.4856619Z configfile: pytest.ini 2025-12-04T12:12:57.4857373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.4858289Z collecting ... collected 380 items / 26 deselected / 354 selected 2025-12-04T12:12:57.4858779Z stepcurrent: skipping 26 already run items. 2025-12-04T12:12:57.4859158Z Running 149 items in this shard 2025-12-04T12:12:57.4859363Z 2025-12-04T12:12:57.4860273Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5416s] [ 0%] 2025-12-04T12:12:57.4862184Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [ 0%] 2025-12-04T12:12:57.4864021Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1569s] [ 0%] 2025-12-04T12:12:57.4864975Z 2025-12-04T12:12:57.4865113Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.4865946Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4866743Z Traceback (most recent call last): 2025-12-04T12:12:57.4867436Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4868227Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4868755Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4869085Z 2025-12-04T12:12:57.4869327Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4870604Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4871674Z 2025-12-04T12:12:57.4871934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4872585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4873039Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4873369Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4873911Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4874636Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4875072Z graph_break [] 2025-12-04T12:12:57.4875441Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4876546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4877484Z warnings.warn( 2025-12-04T12:12:57.4878207Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4879018Z Traceback (most recent call last): 2025-12-04T12:12:57.4879719Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4880498Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4881032Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4881360Z 2025-12-04T12:12:57.4881580Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4882899Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4883968Z 2025-12-04T12:12:57.4884228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4884846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4885313Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4885633Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4886180Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4886873Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4887326Z graph_break [] 2025-12-04T12:12:57.4887677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4888755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4889708Z warnings.warn( 2025-12-04T12:12:57.4890069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4890537Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4890867Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4891294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4891968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4892540Z graph_break [] 2025-12-04T12:12:57.4892907Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4893961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4894913Z warnings.warn( 2025-12-04T12:12:57.4895273Z =================================== FAILURES =================================== 2025-12-04T12:12:57.4896121Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4896916Z Traceback (most recent call last): 2025-12-04T12:12:57.4897611Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4898526Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4899051Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4899395Z 2025-12-04T12:12:57.4899605Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4901111Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4902236Z 2025-12-04T12:12:57.4902520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4903121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4903592Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4903920Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4904460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4905134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4905581Z graph_break [] 2025-12-04T12:12:57.4905945Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4907000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4907951Z warnings.warn( 2025-12-04T12:12:57.4908328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4908791Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4909105Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4909537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4910220Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4910774Z graph_break [] 2025-12-04T12:12:57.4911141Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4912205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4913158Z warnings.warn( 2025-12-04T12:12:57.4913520Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4913978Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4914307Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4914726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4915404Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4915966Z graph_break [] 2025-12-04T12:12:57.4916312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4917378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4918325Z warnings.warn( 2025-12-04T12:12:57.4919285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml - 2025-12-04T12:12:57.4920374Z =========================== short test summary info ============================ 2025-12-04T12:12:57.4921798Z FAILED [0.1569s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4923076Z 2025-12-04T12:12:57.4923291Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4924621Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4925678Z 2025-12-04T12:12:57.4925953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4926515Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.4927071Z ================== 1 failed, 26 deselected, 2 rerun in 4.91s =================== 2025-12-04T12:12:57.4927503Z Got exit code 1 2025-12-04T12:12:57.4927752Z Retrying single test... 2025-12-04T12:12:57.4928596Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml 2025-12-04T12:12:57.4929514Z ============================= test session starts ============================== 2025-12-04T12:12:57.4930157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.4930729Z cachedir: .pytest_cache 2025-12-04T12:12:57.4931419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.4932182Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.4932512Z configfile: pytest.ini 2025-12-04T12:12:57.4933271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.4934206Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.4935575Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4936812Z Running 1 items in this shard 2025-12-04T12:12:57.4937028Z 2025-12-04T12:12:57.4937921Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5363s] [100%] 2025-12-04T12:12:57.4939832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [100%] 2025-12-04T12:12:57.4941664Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%] 2025-12-04T12:12:57.4942612Z 2025-12-04T12:12:57.4942762Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.4943583Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4944393Z Traceback (most recent call last): 2025-12-04T12:12:57.4945091Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4945876Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4946399Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4946745Z 2025-12-04T12:12:57.4946953Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4948271Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4949331Z 2025-12-04T12:12:57.4949609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4950213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4950677Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4951007Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4951577Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4952262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4952710Z graph_break [] 2025-12-04T12:12:57.4953072Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4954167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4955153Z warnings.warn( 2025-12-04T12:12:57.4955870Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4956674Z Traceback (most recent call last): 2025-12-04T12:12:57.4957372Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4958161Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4958701Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4959036Z 2025-12-04T12:12:57.4959247Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4960518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4961585Z 2025-12-04T12:12:57.4961850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4962536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4962993Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4963331Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4963881Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4964574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4965016Z graph_break [] 2025-12-04T12:12:57.4965387Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4966460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4967403Z warnings.warn( 2025-12-04T12:12:57.4967783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4968246Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4968583Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4968992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4969677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4970250Z graph_break [] 2025-12-04T12:12:57.4970606Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4971681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4972644Z warnings.warn( 2025-12-04T12:12:57.4972942Z =================================== FAILURES =================================== 2025-12-04T12:12:57.4973763Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.4974617Z Traceback (most recent call last): 2025-12-04T12:12:57.4975314Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.4976086Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.4976614Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.4976953Z 2025-12-04T12:12:57.4977161Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.4978467Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.4979546Z 2025-12-04T12:12:57.4979805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.4980413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4980881Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4981244Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4981777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4982470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4982919Z graph_break [] 2025-12-04T12:12:57.4983269Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4984338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4985285Z warnings.warn( 2025-12-04T12:12:57.4985661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4986109Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4986437Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4986996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4987672Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4988246Z graph_break [] 2025-12-04T12:12:57.4988617Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4989684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4990618Z warnings.warn( 2025-12-04T12:12:57.4990997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.4991461Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.4991780Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.4992209Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.4992888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.4993456Z graph_break [] 2025-12-04T12:12:57.4993808Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.4994873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.4995818Z warnings.warn( 2025-12-04T12:12:57.4996762Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml - 2025-12-04T12:12:57.4997861Z =========================== short test summary info ============================ 2025-12-04T12:12:57.4999230Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5000426Z 2025-12-04T12:12:57.5000695Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5002483Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5003538Z 2025-12-04T12:12:57.5003799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5004490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5005010Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.5005430Z Got exit code 1 2025-12-04T12:12:57.5005696Z Retrying single test... 2025-12-04T12:12:57.5006558Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml 2025-12-04T12:12:57.5007475Z ============================= test session starts ============================== 2025-12-04T12:12:57.5008155Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5008746Z cachedir: .pytest_cache 2025-12-04T12:12:57.5009437Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5010200Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5010531Z configfile: pytest.ini 2025-12-04T12:12:57.5011294Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5012222Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5013582Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5014833Z Running 1 items in this shard 2025-12-04T12:12:57.5015054Z 2025-12-04T12:12:57.5015953Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5584s] [100%] 2025-12-04T12:12:57.5017875Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1657s] [100%] 2025-12-04T12:12:57.5019712Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1596s] [100%] 2025-12-04T12:12:57.5020651Z 2025-12-04T12:12:57.5020789Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5021624Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5022431Z Traceback (most recent call last): 2025-12-04T12:12:57.5023127Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5066291Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5067025Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5067367Z 2025-12-04T12:12:57.5067615Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5068897Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5069981Z 2025-12-04T12:12:57.5070246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5071031Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5071515Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5071837Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5072388Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5073086Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5073541Z graph_break [] 2025-12-04T12:12:57.5073936Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5075017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5076012Z warnings.warn( 2025-12-04T12:12:57.5076714Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5077527Z Traceback (most recent call last): 2025-12-04T12:12:57.5078267Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5079041Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5079574Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5079919Z 2025-12-04T12:12:57.5080129Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5081403Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5082545Z 2025-12-04T12:12:57.5082823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5083433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5083897Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5084231Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5084765Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5085455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5085909Z graph_break [] 2025-12-04T12:12:57.5086256Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5087332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5088280Z warnings.warn( 2025-12-04T12:12:57.5088652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5089103Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5089433Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5089860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5090533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5091098Z graph_break [] 2025-12-04T12:12:57.5091466Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5092530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5093467Z warnings.warn( 2025-12-04T12:12:57.5093776Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5094613Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5095412Z Traceback (most recent call last): 2025-12-04T12:12:57.5096107Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5096939Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5097479Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5097809Z 2025-12-04T12:12:57.5098020Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5099291Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5100396Z 2025-12-04T12:12:57.5100657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5101492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5102025Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5102357Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5102904Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5103597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5104082Z graph_break [] 2025-12-04T12:12:57.5104447Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5105520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5106451Z warnings.warn( 2025-12-04T12:12:57.5106832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5107297Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5107615Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5108038Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5108719Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5109287Z graph_break [] 2025-12-04T12:12:57.5109644Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5110711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5111645Z warnings.warn( 2025-12-04T12:12:57.5112001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5112449Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5112771Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5113191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5113858Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5114432Z graph_break [] 2025-12-04T12:12:57.5114780Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5115828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5116770Z warnings.warn( 2025-12-04T12:12:57.5117727Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml - 2025-12-04T12:12:57.5118811Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5120171Z FAILED [0.1596s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5121384Z 2025-12-04T12:12:57.5121585Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5122977Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5124026Z 2025-12-04T12:12:57.5124294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5124863Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5125367Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.5125805Z Got exit code 1 2025-12-04T12:12:57.5126857Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5128220Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5129402Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml 2025-12-04T12:12:57.5130314Z ============================= test session starts ============================== 2025-12-04T12:12:57.5130980Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5131545Z cachedir: .pytest_cache 2025-12-04T12:12:57.5132218Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5132966Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5133294Z configfile: pytest.ini 2025-12-04T12:12:57.5134021Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5134931Z collecting ... collected 380 items / 27 deselected / 353 selected 2025-12-04T12:12:57.5135401Z stepcurrent: skipping 27 already run items. 2025-12-04T12:12:57.5135755Z Running 148 items in this shard 2025-12-04T12:12:57.5135959Z 2025-12-04T12:12:57.5136847Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5578s] [ 0%] 2025-12-04T12:12:57.5138782Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1666s] [ 0%] 2025-12-04T12:12:57.5140589Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1637s] [ 0%] 2025-12-04T12:12:57.5141515Z 2025-12-04T12:12:57.5141655Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5142459Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5143244Z Traceback (most recent call last): 2025-12-04T12:12:57.5143927Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5144700Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5145207Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5145540Z 2025-12-04T12:12:57.5145741Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5146991Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5148037Z 2025-12-04T12:12:57.5148297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5148890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5149332Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5149678Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5150195Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5150864Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5151291Z graph_break [] 2025-12-04T12:12:57.5151635Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5154652Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5157605Z return x.grad, w.grad 2025-12-04T12:12:57.5158478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5159404Z warnings.warn( 2025-12-04T12:12:57.5162287Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5165140Z return x.grad, w.grad 2025-12-04T12:12:57.5165852Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5166637Z Traceback (most recent call last): 2025-12-04T12:12:57.5167307Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5168071Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5168580Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5168912Z 2025-12-04T12:12:57.5169114Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5170366Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5171410Z 2025-12-04T12:12:57.5171671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5172267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5172718Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5173032Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5173559Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5174220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5174650Z graph_break [] 2025-12-04T12:12:57.5174998Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5178025Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5180854Z return x.grad, w.grad 2025-12-04T12:12:57.5181739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5182700Z warnings.warn( 2025-12-04T12:12:57.5185476Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5188378Z return x.grad, w.grad 2025-12-04T12:12:57.5188745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5189187Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5189496Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5189905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5190569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5191118Z graph_break [] 2025-12-04T12:12:57.5191461Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5194434Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5197269Z return x.grad, w.grad 2025-12-04T12:12:57.5198152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5199083Z warnings.warn( 2025-12-04T12:12:57.5202070Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5204985Z return x.grad, w.grad 2025-12-04T12:12:57.5205284Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5206104Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5206893Z Traceback (most recent call last): 2025-12-04T12:12:57.5207561Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5208334Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5208855Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5209180Z 2025-12-04T12:12:57.5209467Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5210714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5211766Z 2025-12-04T12:12:57.5212020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5212658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5213106Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5213414Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5213981Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5214648Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5215081Z graph_break [] 2025-12-04T12:12:57.5215422Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5218453Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5221317Z return x.grad, w.grad 2025-12-04T12:12:57.5222196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5223125Z warnings.warn( 2025-12-04T12:12:57.5225915Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5228760Z return x.grad, w.grad 2025-12-04T12:12:57.5229141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5229589Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5229893Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5230305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5230968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5231520Z graph_break [] 2025-12-04T12:12:57.5231860Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5234856Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5237717Z return x.grad, w.grad 2025-12-04T12:12:57.5238649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5239602Z warnings.warn( 2025-12-04T12:12:57.5242491Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5245391Z return x.grad, w.grad 2025-12-04T12:12:57.5245785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5246257Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5246578Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5247043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5247728Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5248297Z graph_break [] 2025-12-04T12:12:57.5248649Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5249722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5250674Z warnings.warn( 2025-12-04T12:12:57.5253485Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5256350Z return x.grad, w.grad 2025-12-04T12:12:57.5257326Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml - 2025-12-04T12:12:57.5258436Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5259802Z FAILED [0.1637s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5260986Z 2025-12-04T12:12:57.5261209Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5262467Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5263523Z 2025-12-04T12:12:57.5263783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5264359Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5264866Z ================== 1 failed, 27 deselected, 2 rerun in 4.94s =================== 2025-12-04T12:12:57.5265281Z Got exit code 1 2025-12-04T12:12:57.5265537Z Retrying single test... 2025-12-04T12:12:57.5266345Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml 2025-12-04T12:12:57.5267255Z ============================= test session starts ============================== 2025-12-04T12:12:57.5267936Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5268523Z cachedir: .pytest_cache 2025-12-04T12:12:57.5269212Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5269958Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5270303Z configfile: pytest.ini 2025-12-04T12:12:57.5271111Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5272046Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5273390Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5274667Z Running 1 items in this shard 2025-12-04T12:12:57.5274905Z 2025-12-04T12:12:57.5275804Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5914s] [100%] 2025-12-04T12:12:57.5277704Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1669s] [100%] 2025-12-04T12:12:57.5279528Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1648s] [100%] 2025-12-04T12:12:57.5280485Z 2025-12-04T12:12:57.5280623Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5281448Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5282322Z Traceback (most recent call last): 2025-12-04T12:12:57.5283007Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5283799Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5284341Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5284673Z 2025-12-04T12:12:57.5284900Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5286159Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5287216Z 2025-12-04T12:12:57.5287476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5288096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5288564Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5288881Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5289424Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5290106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5290539Z graph_break [] 2025-12-04T12:12:57.5290900Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5293944Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5296806Z return x.grad, w.grad 2025-12-04T12:12:57.5297704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5298641Z warnings.warn( 2025-12-04T12:12:57.5301729Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5304693Z return x.grad, w.grad 2025-12-04T12:12:57.5305428Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5306234Z Traceback (most recent call last): 2025-12-04T12:12:57.5306916Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5307703Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5308240Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5308571Z 2025-12-04T12:12:57.5308791Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5310045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5311102Z 2025-12-04T12:12:57.5311365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5311976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5312438Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5312749Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5313283Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5313966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5314398Z graph_break [] 2025-12-04T12:12:57.5314764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5317759Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5320621Z return x.grad, w.grad 2025-12-04T12:12:57.5321522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5322516Z warnings.warn( 2025-12-04T12:12:57.5325370Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5328230Z return x.grad, w.grad 2025-12-04T12:12:57.5328627Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5329090Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5329405Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5329942Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5330623Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5331176Z graph_break [] 2025-12-04T12:12:57.5331575Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5334581Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5337473Z return x.grad, w.grad 2025-12-04T12:12:57.5338371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5339307Z warnings.warn( 2025-12-04T12:12:57.5342124Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5344985Z return x.grad, w.grad 2025-12-04T12:12:57.5345315Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5346139Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5346946Z Traceback (most recent call last): 2025-12-04T12:12:57.5347641Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5348436Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5348961Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5349308Z 2025-12-04T12:12:57.5349519Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5350789Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5351842Z 2025-12-04T12:12:57.5352117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5352717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5353181Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5353510Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5354052Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5354727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5355176Z graph_break [] 2025-12-04T12:12:57.5355568Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5358590Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5361471Z return x.grad, w.grad 2025-12-04T12:12:57.5362420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5363412Z warnings.warn( 2025-12-04T12:12:57.5366230Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5369091Z return x.grad, w.grad 2025-12-04T12:12:57.5369471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5369935Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5370267Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5370676Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5371361Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5371929Z graph_break [] 2025-12-04T12:12:57.5372293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5375291Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5378157Z return x.grad, w.grad 2025-12-04T12:12:57.5379061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5380011Z warnings.warn( 2025-12-04T12:12:57.5382807Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5385655Z return x.grad, w.grad 2025-12-04T12:12:57.5386035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5386554Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5386883Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5387299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5387979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5388545Z graph_break [] 2025-12-04T12:12:57.5388916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5390010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5390958Z warnings.warn( 2025-12-04T12:12:57.5393790Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5396674Z return x.grad, w.grad 2025-12-04T12:12:57.5397663Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml - 2025-12-04T12:12:57.5398759Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5400129Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5401529Z 2025-12-04T12:12:57.5401747Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5403075Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5404125Z 2025-12-04T12:12:57.5404385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5404962Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5405478Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ================== 2025-12-04T12:12:57.5405916Z Got exit code 1 2025-12-04T12:12:57.5406167Z Retrying single test... 2025-12-04T12:12:57.5406981Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml 2025-12-04T12:12:57.5407897Z ============================= test session starts ============================== 2025-12-04T12:12:57.5408535Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5409120Z cachedir: .pytest_cache 2025-12-04T12:12:57.5409811Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5410574Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5410903Z configfile: pytest.ini 2025-12-04T12:12:57.5411658Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5412588Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5413949Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5415255Z Running 1 items in this shard 2025-12-04T12:12:57.5415476Z 2025-12-04T12:12:57.5416368Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5656s] [100%] 2025-12-04T12:12:57.5418320Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1673s] [100%] 2025-12-04T12:12:57.5420133Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1649s] [100%] 2025-12-04T12:12:57.5421111Z 2025-12-04T12:12:57.5421259Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5422077Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5422919Z Traceback (most recent call last): 2025-12-04T12:12:57.5423616Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5424394Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5424923Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5425264Z 2025-12-04T12:12:57.5425475Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5426746Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5427797Z 2025-12-04T12:12:57.5428069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5428670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5429141Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5429471Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5430001Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5430682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5431125Z graph_break [] 2025-12-04T12:12:57.5431479Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5434484Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5437359Z return x.grad, w.grad 2025-12-04T12:12:57.5438265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5439214Z warnings.warn( 2025-12-04T12:12:57.5442064Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5445097Z return x.grad, w.grad 2025-12-04T12:12:57.5445912Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5446741Z Traceback (most recent call last): 2025-12-04T12:12:57.5447442Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5448273Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5448814Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5449147Z 2025-12-04T12:12:57.5449372Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5450676Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5451768Z 2025-12-04T12:12:57.5452025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5452647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5453117Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5453447Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5453984Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5454672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5455118Z graph_break [] 2025-12-04T12:12:57.5455470Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5458478Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5461356Z return x.grad, w.grad 2025-12-04T12:12:57.5462262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5463213Z warnings.warn( 2025-12-04T12:12:57.5465999Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5468854Z return x.grad, w.grad 2025-12-04T12:12:57.5469246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5469709Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5470026Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5470454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5471139Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5471708Z graph_break [] 2025-12-04T12:12:57.5472059Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5475123Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5477992Z return x.grad, w.grad 2025-12-04T12:12:57.5478898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5479888Z warnings.warn( 2025-12-04T12:12:57.5482742Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5485658Z return x.grad, w.grad 2025-12-04T12:12:57.5485986Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5486819Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.5487624Z Traceback (most recent call last): 2025-12-04T12:12:57.5488300Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5489085Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5489622Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5489951Z 2025-12-04T12:12:57.5490163Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5491427Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5492494Z 2025-12-04T12:12:57.5492752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5493366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5493824Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5494157Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5494702Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5495394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5495834Z graph_break [] 2025-12-04T12:12:57.5496195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5499200Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5502269Z return x.grad, w.grad 2025-12-04T12:12:57.5503251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5504213Z warnings.warn( 2025-12-04T12:12:57.5507064Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5509960Z return x.grad, w.grad 2025-12-04T12:12:57.5510361Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5510815Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5511153Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5511634Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5512308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5512875Z graph_break [] 2025-12-04T12:12:57.5513241Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5516240Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5519109Z return x.grad, w.grad 2025-12-04T12:12:57.5519989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5520935Z warnings.warn( 2025-12-04T12:12:57.5523787Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5526639Z return x.grad, w.grad 2025-12-04T12:12:57.5527032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5527486Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5527812Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5528241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5528910Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5529475Z graph_break [] 2025-12-04T12:12:57.5529845Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5530924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5531861Z warnings.warn( 2025-12-04T12:12:57.5534725Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.5537595Z return x.grad, w.grad 2025-12-04T12:12:57.5538607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml - 2025-12-04T12:12:57.5539715Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5541101Z FAILED [0.1649s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5542324Z 2025-12-04T12:12:57.5542534Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5543805Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5544859Z 2025-12-04T12:12:57.5545134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5545705Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5546218Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.5546657Z Got exit code 1 2025-12-04T12:12:57.5547652Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.5549023Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5550189Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml 2025-12-04T12:12:57.5551113Z ============================= test session starts ============================== 2025-12-04T12:12:57.5551771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5552347Z cachedir: .pytest_cache 2025-12-04T12:12:57.5553045Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5553170Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5553295Z configfile: pytest.ini 2025-12-04T12:12:57.5553874Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5554114Z collecting ... collected 380 items / 28 deselected / 352 selected 2025-12-04T12:12:57.5554254Z stepcurrent: skipping 28 already run items. 2025-12-04T12:12:57.5554367Z Running 147 items in this shard 2025-12-04T12:12:57.5554372Z 2025-12-04T12:12:57.5555406Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.5556400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.5557461Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.5558465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.5559401Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5359s] [ 3%] 2025-12-04T12:12:57.5560290Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1634s] [ 3%] 2025-12-04T12:12:57.5561135Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [ 3%] 2025-12-04T12:12:57.5561184Z 2025-12-04T12:12:57.5561327Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5561873Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5562007Z Traceback (most recent call last): 2025-12-04T12:12:57.5562521Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5562722Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5562943Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5562949Z 2025-12-04T12:12:57.5563162Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5564101Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5564109Z 2025-12-04T12:12:57.5564372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5564589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5564714Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5564827Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5565178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5565395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5565491Z graph_break [] 2025-12-04T12:12:57.5565719Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5566445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5566547Z warnings.warn( 2025-12-04T12:12:57.5567109Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5567227Z Traceback (most recent call last): 2025-12-04T12:12:57.5567698Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5567890Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5568097Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5568102Z 2025-12-04T12:12:57.5568321Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5569245Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5569255Z 2025-12-04T12:12:57.5569564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5569780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5569890Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5570015Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5570350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5570578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5570703Z graph_break [] 2025-12-04T12:12:57.5570916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5571648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5571794Z warnings.warn( 2025-12-04T12:12:57.5572003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5572128Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5572287Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5572513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5572847Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5572943Z graph_break [] 2025-12-04T12:12:57.5573167Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5573880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5573976Z warnings.warn( 2025-12-04T12:12:57.5574131Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5574680Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5574814Z Traceback (most recent call last): 2025-12-04T12:12:57.5575277Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5575470Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5575689Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5575694Z 2025-12-04T12:12:57.5575900Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5576842Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5576850Z 2025-12-04T12:12:57.5577108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5577315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5577436Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5577552Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5577881Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5578106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5578201Z graph_break [] 2025-12-04T12:12:57.5578420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5579137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5579234Z warnings.warn( 2025-12-04T12:12:57.5579455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5579567Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5579678Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5579904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5580266Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5580377Z graph_break [] 2025-12-04T12:12:57.5580586Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5581297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5581438Z warnings.warn( 2025-12-04T12:12:57.5581650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5581757Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5581881Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5582130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5582470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5582565Z graph_break [] 2025-12-04T12:12:57.5582806Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5583533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5583631Z warnings.warn( 2025-12-04T12:12:57.5584435Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml - 2025-12-04T12:12:57.5584615Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5585677Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5585685Z 2025-12-04T12:12:57.5585910Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5586844Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5586850Z 2025-12-04T12:12:57.5587125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5587301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5587512Z ============= 1 failed, 4 skipped, 28 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.5587620Z Got exit code 1 2025-12-04T12:12:57.5587725Z Retrying single test... 2025-12-04T12:12:57.5588355Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml 2025-12-04T12:12:57.5588526Z ============================= test session starts ============================== 2025-12-04T12:12:57.5588870Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5588990Z cachedir: .pytest_cache 2025-12-04T12:12:57.5589497Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5589617Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5589734Z configfile: pytest.ini 2025-12-04T12:12:57.5590309Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5590547Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5591558Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5591706Z Running 1 items in this shard 2025-12-04T12:12:57.5591713Z 2025-12-04T12:12:57.5592617Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5451s] [100%] 2025-12-04T12:12:57.5593542Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%] 2025-12-04T12:12:57.5594370Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1597s] [100%] 2025-12-04T12:12:57.5594408Z 2025-12-04T12:12:57.5594549Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5595112Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5595261Z Traceback (most recent call last): 2025-12-04T12:12:57.5595725Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5595932Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5596138Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5596143Z 2025-12-04T12:12:57.5596353Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5597289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5597297Z 2025-12-04T12:12:57.5597559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5597784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5597901Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5598014Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5598358Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5598574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5598682Z graph_break [] 2025-12-04T12:12:57.5598894Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5599616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5599732Z warnings.warn( 2025-12-04T12:12:57.5600284Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5600403Z Traceback (most recent call last): 2025-12-04T12:12:57.5601081Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5601283Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5601503Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5601508Z 2025-12-04T12:12:57.5601719Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5602700Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5602720Z 2025-12-04T12:12:57.5602983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5603199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5603324Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5603436Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5603843Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5604075Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5604171Z graph_break [] 2025-12-04T12:12:57.5604383Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5605159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5605259Z warnings.warn( 2025-12-04T12:12:57.5605480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5605627Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5605737Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5605961Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5606295Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5606430Z graph_break [] 2025-12-04T12:12:57.5606650Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5607364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5607474Z warnings.warn( 2025-12-04T12:12:57.5607618Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5608171Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5608303Z Traceback (most recent call last): 2025-12-04T12:12:57.5608759Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5608965Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5609174Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5609179Z 2025-12-04T12:12:57.5609388Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5610341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5610347Z 2025-12-04T12:12:57.5610606Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5610824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5610936Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5611048Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5611391Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5611607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5611705Z graph_break [] 2025-12-04T12:12:57.5611925Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5612641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5612751Z warnings.warn( 2025-12-04T12:12:57.5612960Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5613071Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5613194Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5613409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5613740Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5613853Z graph_break [] 2025-12-04T12:12:57.5614063Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5614823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5614924Z warnings.warn( 2025-12-04T12:12:57.5615130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5615255Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5615371Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5615632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5615979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5616120Z graph_break [] 2025-12-04T12:12:57.5616328Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5617055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5617187Z warnings.warn( 2025-12-04T12:12:57.5618001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml - 2025-12-04T12:12:57.5618165Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5619229Z FAILED [0.1597s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5619249Z 2025-12-04T12:12:57.5619461Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5620384Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5620391Z 2025-12-04T12:12:57.5620662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5620835Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5621042Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.5621135Z Got exit code 1 2025-12-04T12:12:57.5621241Z Retrying single test... 2025-12-04T12:12:57.5621883Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml 2025-12-04T12:12:57.5622044Z ============================= test session starts ============================== 2025-12-04T12:12:57.5622385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5622499Z cachedir: .pytest_cache 2025-12-04T12:12:57.5623004Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5623138Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5623245Z configfile: pytest.ini 2025-12-04T12:12:57.5623819Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5624050Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5625053Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5625178Z Running 1 items in this shard 2025-12-04T12:12:57.5625183Z 2025-12-04T12:12:57.5626103Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5386s] [100%] 2025-12-04T12:12:57.5626992Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [100%] 2025-12-04T12:12:57.5627841Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1567s] [100%] 2025-12-04T12:12:57.5627847Z 2025-12-04T12:12:57.5627984Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5628546Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5628695Z Traceback (most recent call last): 2025-12-04T12:12:57.5629160Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5629396Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5629602Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5629607Z 2025-12-04T12:12:57.5629825Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5630753Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5630758Z 2025-12-04T12:12:57.5631026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5631240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5631352Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5631476Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5631806Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5632022Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5632132Z graph_break [] 2025-12-04T12:12:57.5632341Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5633072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5633175Z warnings.warn( 2025-12-04T12:12:57.5633722Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5633850Z Traceback (most recent call last): 2025-12-04T12:12:57.5634307Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5634500Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5634722Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5634729Z 2025-12-04T12:12:57.5634934Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5635871Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5635876Z 2025-12-04T12:12:57.5636135Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5636348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5636471Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5636586Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5636928Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5637139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5637239Z graph_break [] 2025-12-04T12:12:57.5637491Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5638211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5638308Z warnings.warn( 2025-12-04T12:12:57.5638529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5638663Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5638785Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5638999Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5639358Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5639467Z graph_break [] 2025-12-04T12:12:57.5639677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5640390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5640528Z warnings.warn( 2025-12-04T12:12:57.5640669Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5641233Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5641353Z Traceback (most recent call last): 2025-12-04T12:12:57.5641808Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5642012Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5642287Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5642295Z 2025-12-04T12:12:57.5642504Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5643450Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5643457Z 2025-12-04T12:12:57.5643714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5643941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5644052Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5644164Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5644508Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5644723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5644835Z graph_break [] 2025-12-04T12:12:57.5645045Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5645757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5645869Z warnings.warn( 2025-12-04T12:12:57.5646075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5646182Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5646303Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5646517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5646855Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5650025Z graph_break [] 2025-12-04T12:12:57.5650246Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5650981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5651080Z warnings.warn( 2025-12-04T12:12:57.5651359Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5651482Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5651593Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5651807Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5652153Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5652281Z graph_break [] 2025-12-04T12:12:57.5652496Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5653217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5653350Z warnings.warn( 2025-12-04T12:12:57.5654170Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml - 2025-12-04T12:12:57.5654371Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5655434Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5655441Z 2025-12-04T12:12:57.5655666Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5656598Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5656606Z 2025-12-04T12:12:57.5656876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5657050Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5657248Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.5657356Z Got exit code 1 2025-12-04T12:12:57.5658195Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5658605Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5659229Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml 2025-12-04T12:12:57.5659392Z ============================= test session starts ============================== 2025-12-04T12:12:57.5659743Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5659848Z cachedir: .pytest_cache 2025-12-04T12:12:57.5660366Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5660494Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5660601Z configfile: pytest.ini 2025-12-04T12:12:57.5661190Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5661415Z collecting ... collected 380 items / 33 deselected / 347 selected 2025-12-04T12:12:57.5661556Z stepcurrent: skipping 33 already run items. 2025-12-04T12:12:57.5661679Z Running 142 items in this shard 2025-12-04T12:12:57.5661794Z 2025-12-04T12:12:57.5662690Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5453s] [ 0%] 2025-12-04T12:12:57.5663626Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [ 0%] 2025-12-04T12:12:57.5664442Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1587s] [ 0%] 2025-12-04T12:12:57.5664447Z 2025-12-04T12:12:57.5664633Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5665186Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5665306Z Traceback (most recent call last): 2025-12-04T12:12:57.5665780Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5665974Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5666185Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5666239Z 2025-12-04T12:12:57.5666449Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5667371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5667376Z 2025-12-04T12:12:57.5667653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5667868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5667993Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5668108Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5668437Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5668664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5668761Z graph_break [] 2025-12-04T12:12:57.5668972Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5669700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5669799Z warnings.warn( 2025-12-04T12:12:57.5670362Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5670480Z Traceback (most recent call last): 2025-12-04T12:12:57.5670938Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5671146Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5671351Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5671356Z 2025-12-04T12:12:57.5671566Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5672507Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5672513Z 2025-12-04T12:12:57.5672773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5673000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5673110Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5673223Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5673626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5673843Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5673952Z graph_break [] 2025-12-04T12:12:57.5674161Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5674920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5675036Z warnings.warn( 2025-12-04T12:12:57.5675247Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5675357Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5675485Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5675728Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5676077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5676178Z graph_break [] 2025-12-04T12:12:57.5676391Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5677118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5677252Z warnings.warn( 2025-12-04T12:12:57.5677393Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5677958Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5678080Z Traceback (most recent call last): 2025-12-04T12:12:57.5678559Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5678755Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5678966Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5678971Z 2025-12-04T12:12:57.5679196Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5680122Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5680129Z 2025-12-04T12:12:57.5680404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5680617Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5680727Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5680853Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5681185Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5681403Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5681516Z graph_break [] 2025-12-04T12:12:57.5681727Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5682551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5682655Z warnings.warn( 2025-12-04T12:12:57.5682863Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5682988Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5683100Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5683316Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5683660Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5683758Z graph_break [] 2025-12-04T12:12:57.5683982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5684741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5684837Z warnings.warn( 2025-12-04T12:12:57.5685060Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5685209Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5685322Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5685545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5685872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5685981Z graph_break [] 2025-12-04T12:12:57.5686187Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5686924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5687035Z warnings.warn( 2025-12-04T12:12:57.5687840Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml - 2025-12-04T12:12:57.5688018Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5689115Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5689121Z 2025-12-04T12:12:57.5689330Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5690270Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5690278Z 2025-12-04T12:12:57.5690539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5690727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5690917Z ================== 1 failed, 33 deselected, 2 rerun in 4.92s =================== 2025-12-04T12:12:57.5691016Z Got exit code 1 2025-12-04T12:12:57.5691136Z Retrying single test... 2025-12-04T12:12:57.5691758Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml 2025-12-04T12:12:57.5691931Z ============================= test session starts ============================== 2025-12-04T12:12:57.5692279Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5692388Z cachedir: .pytest_cache 2025-12-04T12:12:57.5692909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5693036Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5693144Z configfile: pytest.ini 2025-12-04T12:12:57.5693732Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5693961Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5694982Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5695095Z Running 1 items in this shard 2025-12-04T12:12:57.5695100Z 2025-12-04T12:12:57.5695987Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5542s] [100%] 2025-12-04T12:12:57.5696927Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1622s] [100%] 2025-12-04T12:12:57.5697771Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [100%] 2025-12-04T12:12:57.5697779Z 2025-12-04T12:12:57.5697929Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5698478Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5698609Z Traceback (most recent call last): 2025-12-04T12:12:57.5699116Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5699312Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5699534Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5699539Z 2025-12-04T12:12:57.5699747Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5700685Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5700735Z 2025-12-04T12:12:57.5701175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5701389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5701514Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5701631Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5701964Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5702199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5702298Z graph_break [] 2025-12-04T12:12:57.5702527Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5703251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5703355Z warnings.warn( 2025-12-04T12:12:57.5703916Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5704037Z Traceback (most recent call last): 2025-12-04T12:12:57.5704495Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5704707Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5704912Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5704920Z 2025-12-04T12:12:57.5705142Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5706071Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5706080Z 2025-12-04T12:12:57.5706350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5706566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5706676Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5706804Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5707138Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5707353Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5707462Z graph_break [] 2025-12-04T12:12:57.5707761Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5708494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5708597Z warnings.warn( 2025-12-04T12:12:57.5708853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5708979Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5709090Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5709304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5709650Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5709749Z graph_break [] 2025-12-04T12:12:57.5710000Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5710726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5710827Z warnings.warn( 2025-12-04T12:12:57.5710980Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5711532Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5711692Z Traceback (most recent call last): 2025-12-04T12:12:57.5712168Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5712363Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5712582Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5712586Z 2025-12-04T12:12:57.5712797Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5713730Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5713738Z 2025-12-04T12:12:57.5714010Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5714222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5714344Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5714459Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5714789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5715011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5715106Z graph_break [] 2025-12-04T12:12:57.5715318Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5716040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5716140Z warnings.warn( 2025-12-04T12:12:57.5716359Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5716466Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5716578Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5716806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5717133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5717226Z graph_break [] 2025-12-04T12:12:57.5717446Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5718162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5718270Z warnings.warn( 2025-12-04T12:12:57.5718514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5718621Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5718745Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5718959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5719315Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5719427Z graph_break [] 2025-12-04T12:12:57.5719637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5720360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5720458Z warnings.warn( 2025-12-04T12:12:57.5721284Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml - 2025-12-04T12:12:57.5721469Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5722603Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5722645Z 2025-12-04T12:12:57.5722874Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5723797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5723803Z 2025-12-04T12:12:57.5724062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5724255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5724451Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.5724559Z Got exit code 1 2025-12-04T12:12:57.5724662Z Retrying single test... 2025-12-04T12:12:57.5725283Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml 2025-12-04T12:12:57.5725457Z ============================= test session starts ============================== 2025-12-04T12:12:57.5725797Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5725904Z cachedir: .pytest_cache 2025-12-04T12:12:57.5726424Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5726545Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5726663Z configfile: pytest.ini 2025-12-04T12:12:57.5727236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5727460Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5728490Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5728605Z Running 1 items in this shard 2025-12-04T12:12:57.5728611Z 2025-12-04T12:12:57.5729509Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5321s] [100%] 2025-12-04T12:12:57.5730393Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%] 2025-12-04T12:12:57.5731259Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1572s] [100%] 2025-12-04T12:12:57.5731264Z 2025-12-04T12:12:57.5731413Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5731995Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5732127Z Traceback (most recent call last): 2025-12-04T12:12:57.5732589Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5732800Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5733042Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5733047Z 2025-12-04T12:12:57.5733259Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5734214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5734219Z 2025-12-04T12:12:57.5734481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5734742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5734854Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5734969Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5735317Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5735531Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5735629Z graph_break [] 2025-12-04T12:12:57.5735858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5736584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5736700Z warnings.warn( 2025-12-04T12:12:57.5737255Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5737376Z Traceback (most recent call last): 2025-12-04T12:12:57.5737854Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5738047Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5738256Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5738260Z 2025-12-04T12:12:57.5738485Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5739405Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5739412Z 2025-12-04T12:12:57.5739687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5739901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5740018Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5740149Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5740481Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5740714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5740814Z graph_break [] 2025-12-04T12:12:57.5741026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5741762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5741902Z warnings.warn( 2025-12-04T12:12:57.5742111Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5742231Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5742345Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5742604Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5742938Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5743033Z graph_break [] 2025-12-04T12:12:57.5743259Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5744022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5744127Z warnings.warn( 2025-12-04T12:12:57.5744284Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5744841Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5744976Z Traceback (most recent call last): 2025-12-04T12:12:57.5745436Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5745661Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5745877Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5745882Z 2025-12-04T12:12:57.5746087Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5747028Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5747033Z 2025-12-04T12:12:57.5747289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5747500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5747621Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5747733Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5748062Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5748287Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5748379Z graph_break [] 2025-12-04T12:12:57.5748598Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5749316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5749410Z warnings.warn( 2025-12-04T12:12:57.5749632Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5749742Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5749853Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5750079Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5750405Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5750517Z graph_break [] 2025-12-04T12:12:57.5750728Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5751440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5751551Z warnings.warn( 2025-12-04T12:12:57.5751758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5751868Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5751990Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5752206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5752586Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5752684Z graph_break [] 2025-12-04T12:12:57.5752892Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5753657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5753756Z warnings.warn( 2025-12-04T12:12:57.5754566Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml - 2025-12-04T12:12:57.5754764Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5755830Z FAILED [0.1572s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5755838Z 2025-12-04T12:12:57.5756061Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5756986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5757023Z 2025-12-04T12:12:57.5757299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5757474Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5757668Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.5757778Z Got exit code 1 2025-12-04T12:12:57.5758621Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5759036Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5759664Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml 2025-12-04T12:12:57.5759833Z ============================= test session starts ============================== 2025-12-04T12:12:57.5760186Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5760293Z cachedir: .pytest_cache 2025-12-04T12:12:57.5760815Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5760936Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5761042Z configfile: pytest.ini 2025-12-04T12:12:57.5761631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5761857Z collecting ... collected 380 items / 34 deselected / 346 selected 2025-12-04T12:12:57.5761997Z stepcurrent: skipping 34 already run items. 2025-12-04T12:12:57.5762229Z Running 141 items in this shard 2025-12-04T12:12:57.5762242Z 2025-12-04T12:12:57.5763261Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.5764171Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5439s] [ 1%] 2025-12-04T12:12:57.5765049Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1619s] [ 1%] 2025-12-04T12:12:57.5765919Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1564s] [ 1%] 2025-12-04T12:12:57.5765927Z 2025-12-04T12:12:57.5766096Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5766640Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5766773Z Traceback (most recent call last): 2025-12-04T12:12:57.5767233Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5767471Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5767679Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5767687Z 2025-12-04T12:12:57.5767894Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5768828Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5768866Z 2025-12-04T12:12:57.5769130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5769355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5769466Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5769580Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5769925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5770139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5770234Z graph_break [] 2025-12-04T12:12:57.5770460Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5771177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5771288Z warnings.warn( 2025-12-04T12:12:57.5771837Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5771953Z Traceback (most recent call last): 2025-12-04T12:12:57.5772422Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5772614Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5772820Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5772839Z 2025-12-04T12:12:57.5773045Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5773963Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5773969Z 2025-12-04T12:12:57.5774238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5774455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5774564Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5774687Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5775016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5775242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5775338Z graph_break [] 2025-12-04T12:12:57.5775548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5776272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5776415Z warnings.warn( 2025-12-04T12:12:57.5776624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5776772Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5776890Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5777117Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5777444Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5777539Z graph_break [] 2025-12-04T12:12:57.5777760Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5778504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5778607Z warnings.warn( 2025-12-04T12:12:57.5778760Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5779307Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5779437Z Traceback (most recent call last): 2025-12-04T12:12:57.5779934Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5780126Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5780344Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5780349Z 2025-12-04T12:12:57.5780557Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5781493Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5781501Z 2025-12-04T12:12:57.5781764Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5781974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5782094Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5782212Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5782554Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5782770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5782865Z graph_break [] 2025-12-04T12:12:57.5783084Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5783796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5783894Z warnings.warn( 2025-12-04T12:12:57.5784118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5784227Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5784352Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5784564Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5784898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5822241Z graph_break [] 2025-12-04T12:12:57.5822640Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5823377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5823517Z warnings.warn( 2025-12-04T12:12:57.5823735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5823846Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5824182Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5824406Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5824758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5824855Z graph_break [] 2025-12-04T12:12:57.5825142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5825874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5825974Z warnings.warn( 2025-12-04T12:12:57.5826830Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml - 2025-12-04T12:12:57.5827012Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5828075Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5828084Z 2025-12-04T12:12:57.5828316Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5829292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5829299Z 2025-12-04T12:12:57.5829577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5829759Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5829969Z ============= 1 failed, 1 skipped, 34 deselected, 2 rerun in 4.92s ============= 2025-12-04T12:12:57.5830086Z Got exit code 1 2025-12-04T12:12:57.5830194Z Retrying single test... 2025-12-04T12:12:57.5830825Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml 2025-12-04T12:12:57.5831000Z ============================= test session starts ============================== 2025-12-04T12:12:57.5831349Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5831467Z cachedir: .pytest_cache 2025-12-04T12:12:57.5831982Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5832111Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5832230Z configfile: pytest.ini 2025-12-04T12:12:57.5832808Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5833030Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5834044Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5834156Z Running 1 items in this shard 2025-12-04T12:12:57.5834166Z 2025-12-04T12:12:57.5835063Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5284s] [100%] 2025-12-04T12:12:57.5835950Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%] 2025-12-04T12:12:57.5836765Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1580s] [100%] 2025-12-04T12:12:57.5836807Z 2025-12-04T12:12:57.5836947Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5837524Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5837661Z Traceback (most recent call last): 2025-12-04T12:12:57.5838127Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5838335Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5838540Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5838545Z 2025-12-04T12:12:57.5838786Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5839728Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5839736Z 2025-12-04T12:12:57.5839995Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5840219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5840378Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5840489Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5840836Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5841050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5841145Z graph_break [] 2025-12-04T12:12:57.5841372Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5842102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5842303Z warnings.warn( 2025-12-04T12:12:57.5842847Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5842965Z Traceback (most recent call last): 2025-12-04T12:12:57.5843579Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5843781Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5843988Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5844009Z 2025-12-04T12:12:57.5844273Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5845194Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5845201Z 2025-12-04T12:12:57.5845474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5845687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5845795Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5845920Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5846256Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5846481Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5846575Z graph_break [] 2025-12-04T12:12:57.5846786Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5847520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5847619Z warnings.warn( 2025-12-04T12:12:57.5847831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5848014Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5848127Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5848357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5848721Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5848821Z graph_break [] 2025-12-04T12:12:57.5849050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5849760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5849857Z warnings.warn( 2025-12-04T12:12:57.5850048Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5850594Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5850729Z Traceback (most recent call last): 2025-12-04T12:12:57.5851188Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5851382Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5851642Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5851648Z 2025-12-04T12:12:57.5851855Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5852792Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5852797Z 2025-12-04T12:12:57.5853058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5853272Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5853398Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5853509Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5853854Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5854068Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5854169Z graph_break [] 2025-12-04T12:12:57.5854392Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5855106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5855203Z warnings.warn( 2025-12-04T12:12:57.5855428Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5855535Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5855661Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5855874Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5856202Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5856311Z graph_break [] 2025-12-04T12:12:57.5856526Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5857238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5857352Z warnings.warn( 2025-12-04T12:12:57.5857559Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5857679Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5857788Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5858003Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5858352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5858500Z graph_break [] 2025-12-04T12:12:57.5858708Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5859459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5859559Z warnings.warn( 2025-12-04T12:12:57.5860371Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml - 2025-12-04T12:12:57.5860539Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5861623Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5861632Z 2025-12-04T12:12:57.5861858Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5862777Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5862818Z 2025-12-04T12:12:57.5863093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5863267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5863461Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.5863567Z Got exit code 1 2025-12-04T12:12:57.5863669Z Retrying single test... 2025-12-04T12:12:57.5864305Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml 2025-12-04T12:12:57.5864464Z ============================= test session starts ============================== 2025-12-04T12:12:57.5864807Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5864929Z cachedir: .pytest_cache 2025-12-04T12:12:57.5865440Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5865566Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5865683Z configfile: pytest.ini 2025-12-04T12:12:57.5866258Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5866496Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5867500Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5867615Z Running 1 items in this shard 2025-12-04T12:12:57.5867620Z 2025-12-04T12:12:57.5868516Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5563s] [100%] 2025-12-04T12:12:57.5869406Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1610s] [100%] 2025-12-04T12:12:57.5870223Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1585s] [100%] 2025-12-04T12:12:57.5870231Z 2025-12-04T12:12:57.5870368Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5870928Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5871088Z Traceback (most recent call last): 2025-12-04T12:12:57.5871550Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5871789Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5872000Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5872005Z 2025-12-04T12:12:57.5872228Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5873258Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5873263Z 2025-12-04T12:12:57.5873522Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5873752Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5873863Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5873973Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5874316Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5874562Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5874672Z graph_break [] 2025-12-04T12:12:57.5874882Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5875597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5875710Z warnings.warn( 2025-12-04T12:12:57.5876255Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5876387Z Traceback (most recent call last): 2025-12-04T12:12:57.5876845Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5877034Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5877253Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5877263Z 2025-12-04T12:12:57.5877471Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5878397Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5878415Z 2025-12-04T12:12:57.5878674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5878883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5879007Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5879122Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5879454Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5879681Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5879776Z graph_break [] 2025-12-04T12:12:57.5880003Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5880718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5880815Z warnings.warn( 2025-12-04T12:12:57.5881038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5881146Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5881261Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5881484Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5881865Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5881975Z graph_break [] 2025-12-04T12:12:57.5882265Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5883016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5883132Z warnings.warn( 2025-12-04T12:12:57.5883270Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5883820Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.5884000Z Traceback (most recent call last): 2025-12-04T12:12:57.5884462Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5884669Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5884876Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5884881Z 2025-12-04T12:12:57.5885087Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5886026Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5886065Z 2025-12-04T12:12:57.5886328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5886555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5886664Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5886778Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5887122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5887339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5887437Z graph_break [] 2025-12-04T12:12:57.5887663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5888380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5888495Z warnings.warn( 2025-12-04T12:12:57.5888705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5888811Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5888935Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5889151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5889476Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5889584Z graph_break [] 2025-12-04T12:12:57.5889796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5890519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5890615Z warnings.warn( 2025-12-04T12:12:57.5890829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5890946Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5891058Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5891271Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5891609Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5891703Z graph_break [] 2025-12-04T12:12:57.5891927Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5892637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5892770Z warnings.warn( 2025-12-04T12:12:57.5893582Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml - 2025-12-04T12:12:57.5893802Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5894871Z FAILED [0.1585s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5894877Z 2025-12-04T12:12:57.5895119Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5896038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5896045Z 2025-12-04T12:12:57.5896321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5896499Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5896740Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.5896841Z Got exit code 1 2025-12-04T12:12:57.5897684Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.5898102Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5898727Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml 2025-12-04T12:12:57.5898903Z ============================= test session starts ============================== 2025-12-04T12:12:57.5899248Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5899357Z cachedir: .pytest_cache 2025-12-04T12:12:57.5899880Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5900005Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5900116Z configfile: pytest.ini 2025-12-04T12:12:57.5900705Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5901189Z collecting ... collected 380 items / 36 deselected / 344 selected 2025-12-04T12:12:57.5901350Z stepcurrent: skipping 36 already run items. 2025-12-04T12:12:57.5901462Z Running 139 items in this shard 2025-12-04T12:12:57.5901468Z 2025-12-04T12:12:57.5902479Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.5903384Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5489s] [ 1%] 2025-12-04T12:12:57.5904271Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1604s] [ 1%] 2025-12-04T12:12:57.5905095Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [ 1%] 2025-12-04T12:12:57.5905101Z 2025-12-04T12:12:57.5905347Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5905911Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5906031Z Traceback (most recent call last): 2025-12-04T12:12:57.5906540Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5906752Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5906964Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5906969Z 2025-12-04T12:12:57.5907190Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5908166Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5908174Z 2025-12-04T12:12:57.5908438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5908665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5908774Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5908902Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5909276Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5909492Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5909608Z graph_break [] 2025-12-04T12:12:57.5909818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5910540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5910654Z warnings.warn( 2025-12-04T12:12:57.5911205Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5911339Z Traceback (most recent call last): 2025-12-04T12:12:57.5911797Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5911991Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5912213Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5912218Z 2025-12-04T12:12:57.5912425Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5913356Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5913378Z 2025-12-04T12:12:57.5913639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5913854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5913975Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5914084Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5914413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5914644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5914738Z graph_break [] 2025-12-04T12:12:57.5914959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5915675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5915775Z warnings.warn( 2025-12-04T12:12:57.5915995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5916101Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5916249Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5916474Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5916803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5916913Z graph_break [] 2025-12-04T12:12:57.5917147Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5917862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5917971Z warnings.warn( 2025-12-04T12:12:57.5918110Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5918691Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5918821Z Traceback (most recent call last): 2025-12-04T12:12:57.5919286Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5919492Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5919698Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5919703Z 2025-12-04T12:12:57.5919912Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5920879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5920917Z 2025-12-04T12:12:57.5921175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5921398Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5921506Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5921616Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5921957Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5922238Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5922333Z graph_break [] 2025-12-04T12:12:57.5922561Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5923275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5923388Z warnings.warn( 2025-12-04T12:12:57.5923594Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5923704Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5923829Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5924045Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5924372Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5924485Z graph_break [] 2025-12-04T12:12:57.5924696Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5925424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5925522Z warnings.warn( 2025-12-04T12:12:57.5925731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5925860Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5925970Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5926185Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5926528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5926623Z graph_break [] 2025-12-04T12:12:57.5926843Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5927598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5927696Z warnings.warn( 2025-12-04T12:12:57.5928552Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml - 2025-12-04T12:12:57.5928725Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5929817Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5929824Z 2025-12-04T12:12:57.5930036Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5930966Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5930985Z 2025-12-04T12:12:57.5931247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5931460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5931684Z ============= 1 failed, 1 skipped, 36 deselected, 2 rerun in 4.92s ============= 2025-12-04T12:12:57.5931779Z Got exit code 1 2025-12-04T12:12:57.5931888Z Retrying single test... 2025-12-04T12:12:57.5932528Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml 2025-12-04T12:12:57.5932691Z ============================= test session starts ============================== 2025-12-04T12:12:57.5933044Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5933153Z cachedir: .pytest_cache 2025-12-04T12:12:57.5933660Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5933793Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5933905Z configfile: pytest.ini 2025-12-04T12:12:57.5934481Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5934714Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5935722Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5935846Z Running 1 items in this shard 2025-12-04T12:12:57.5935853Z 2025-12-04T12:12:57.5936750Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5561s] [100%] 2025-12-04T12:12:57.5937659Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1639s] [100%] 2025-12-04T12:12:57.5938466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%] 2025-12-04T12:12:57.5938471Z 2025-12-04T12:12:57.5938608Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5939174Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5939343Z Traceback (most recent call last): 2025-12-04T12:12:57.5939817Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5940010Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5940217Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5940256Z 2025-12-04T12:12:57.5940478Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5941409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5941414Z 2025-12-04T12:12:57.5941717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5941931Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5942041Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5942167Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5942499Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5942717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5942823Z graph_break [] 2025-12-04T12:12:57.5943079Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5943811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5943910Z warnings.warn( 2025-12-04T12:12:57.5944459Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5944595Z Traceback (most recent call last): 2025-12-04T12:12:57.5945055Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5945265Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5945471Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5945476Z 2025-12-04T12:12:57.5945686Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5946633Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5946638Z 2025-12-04T12:12:57.5946899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5947127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5947239Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5947351Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5947698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5947916Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5948012Z graph_break [] 2025-12-04T12:12:57.5948236Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5948958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5949075Z warnings.warn( 2025-12-04T12:12:57.5949286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5949394Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5949521Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5949738Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5950065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5950217Z graph_break [] 2025-12-04T12:12:57.5950424Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5951151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5951283Z warnings.warn( 2025-12-04T12:12:57.5951424Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5951995Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5952117Z Traceback (most recent call last): 2025-12-04T12:12:57.5952621Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5952815Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5953021Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5953028Z 2025-12-04T12:12:57.5953250Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5954180Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5954214Z 2025-12-04T12:12:57.5954485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5954694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5954806Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5954932Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5955262Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5955471Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5955580Z graph_break [] 2025-12-04T12:12:57.5955785Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5956510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5956609Z warnings.warn( 2025-12-04T12:12:57.5956818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5956936Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5957045Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5957256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5957595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5957690Z graph_break [] 2025-12-04T12:12:57.5957910Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5958619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5958717Z warnings.warn( 2025-12-04T12:12:57.5958936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5959044Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5959159Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5959385Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5959713Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5959819Z graph_break [] 2025-12-04T12:12:57.5960027Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5960735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5960878Z warnings.warn( 2025-12-04T12:12:57.5961671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml - 2025-12-04T12:12:57.5961842Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5963025Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5963033Z 2025-12-04T12:12:57.5963246Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5964214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5964222Z 2025-12-04T12:12:57.5964482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5964669Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5964864Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.5964960Z Got exit code 1 2025-12-04T12:12:57.5965116Z Retrying single test... 2025-12-04T12:12:57.5965741Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml 2025-12-04T12:12:57.5965899Z ============================= test session starts ============================== 2025-12-04T12:12:57.5966255Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.5966364Z cachedir: .pytest_cache 2025-12-04T12:12:57.5966880Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.5967003Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.5967110Z configfile: pytest.ini 2025-12-04T12:12:57.5967699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.5967924Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.5968948Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5969058Z Running 1 items in this shard 2025-12-04T12:12:57.5969063Z 2025-12-04T12:12:57.5969950Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5387s] [100%] 2025-12-04T12:12:57.5970850Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1601s] [100%] 2025-12-04T12:12:57.5971658Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1579s] [100%] 2025-12-04T12:12:57.5971665Z 2025-12-04T12:12:57.5971814Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.5972363Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5972483Z Traceback (most recent call last): 2025-12-04T12:12:57.5972958Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5973153Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5973428Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5973433Z 2025-12-04T12:12:57.5973640Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5974600Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5974623Z 2025-12-04T12:12:57.5974884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5975098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5975222Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5975333Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5975690Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5975921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5976021Z graph_break [] 2025-12-04T12:12:57.5976250Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5976967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5977098Z warnings.warn( 2025-12-04T12:12:57.5977658Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5977776Z Traceback (most recent call last): 2025-12-04T12:12:57.5978229Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5978436Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5978640Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5978648Z 2025-12-04T12:12:57.5978866Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5979793Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5979803Z 2025-12-04T12:12:57.5980060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5980283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5980391Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5980514Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5980845Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5981062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5981167Z graph_break [] 2025-12-04T12:12:57.5981376Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5982089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5982199Z warnings.warn( 2025-12-04T12:12:57.5982407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5982532Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5982642Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5982852Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5983192Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5983287Z graph_break [] 2025-12-04T12:12:57.5983496Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5984220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5984350Z warnings.warn( 2025-12-04T12:12:57.5984505Z =================================== FAILURES =================================== 2025-12-04T12:12:57.5985086Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.5985206Z Traceback (most recent call last): 2025-12-04T12:12:57.5985672Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.5985863Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.5986066Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5986083Z 2025-12-04T12:12:57.5986318Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5987243Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5987251Z 2025-12-04T12:12:57.5987520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5987732Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5987883Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5987996Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5988324Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5988547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5988641Z graph_break [] 2025-12-04T12:12:57.5988852Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5989576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5989675Z warnings.warn( 2025-12-04T12:12:57.5989898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5990005Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5990114Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5990344Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5990670Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5990765Z graph_break [] 2025-12-04T12:12:57.5990988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5991700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5991812Z warnings.warn( 2025-12-04T12:12:57.5992022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.5992135Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.5992259Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.5992472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.5992800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.5992912Z graph_break [] 2025-12-04T12:12:57.5993120Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.5993830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.5993938Z warnings.warn( 2025-12-04T12:12:57.5994739Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml - 2025-12-04T12:12:57.5994954Z =========================== short test summary info ============================ 2025-12-04T12:12:57.5996014Z FAILED [0.1579s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.5996053Z 2025-12-04T12:12:57.5996278Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.5997203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5997208Z 2025-12-04T12:12:57.5997495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.5997684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.5997880Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.5997991Z Got exit code 1 2025-12-04T12:12:57.5998834Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.5999268Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.5999903Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml 2025-12-04T12:12:57.6000064Z ============================= test session starts ============================== 2025-12-04T12:12:57.6000419Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6000527Z cachedir: .pytest_cache 2025-12-04T12:12:57.6001464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6001605Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6001711Z configfile: pytest.ini 2025-12-04T12:12:57.6002350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6002597Z collecting ... collected 380 items / 38 deselected / 342 selected 2025-12-04T12:12:57.6002737Z stepcurrent: skipping 38 already run items. 2025-12-04T12:12:57.6002863Z Running 137 items in this shard 2025-12-04T12:12:57.6002868Z 2025-12-04T12:12:57.6003880Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6004874Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.6005771Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5710s] [ 2%] 2025-12-04T12:12:57.6006654Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1680s] [ 2%] 2025-12-04T12:12:57.6007472Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1655s] [ 2%] 2025-12-04T12:12:57.6007480Z 2025-12-04T12:12:57.6007618Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6008173Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6008380Z Traceback (most recent call last): 2025-12-04T12:12:57.6008842Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6009109Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6009323Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6009328Z 2025-12-04T12:12:57.6009551Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6010510Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6010516Z 2025-12-04T12:12:57.6010779Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6011012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6011127Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6011256Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6011588Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6011865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6011979Z graph_break [] 2025-12-04T12:12:57.6012192Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6014862Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6014988Z return x.grad, w.grad 2025-12-04T12:12:57.6015708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6015824Z warnings.warn( 2025-12-04T12:12:57.6018461Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6018583Z return x.grad, w.grad 2025-12-04T12:12:57.6019123Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6019256Z Traceback (most recent call last): 2025-12-04T12:12:57.6019714Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6019909Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6020132Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6020137Z 2025-12-04T12:12:57.6020343Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6021263Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6021318Z 2025-12-04T12:12:57.6021577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6021790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6021911Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6022053Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6022386Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6022612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6022706Z graph_break [] 2025-12-04T12:12:57.6022925Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6025603Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6025748Z return x.grad, w.grad 2025-12-04T12:12:57.6026466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6026562Z warnings.warn( 2025-12-04T12:12:57.6029212Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6029317Z return x.grad, w.grad 2025-12-04T12:12:57.6029549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6029655Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6029766Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6029992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6030319Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6030415Z graph_break [] 2025-12-04T12:12:57.6030637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6033268Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6033389Z return x.grad, w.grad 2025-12-04T12:12:57.6034102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6034221Z warnings.warn( 2025-12-04T12:12:57.6036891Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6037042Z return x.grad, w.grad 2025-12-04T12:12:57.6037183Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6037720Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6037880Z Traceback (most recent call last): 2025-12-04T12:12:57.6038340Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6038544Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6038750Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6038755Z 2025-12-04T12:12:57.6038964Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6039896Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6039931Z 2025-12-04T12:12:57.6040193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6040415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6040522Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6040634Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6040977Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6041190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6041286Z graph_break [] 2025-12-04T12:12:57.6041508Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6044227Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6044350Z return x.grad, w.grad 2025-12-04T12:12:57.6045065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6045180Z warnings.warn( 2025-12-04T12:12:57.6047840Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6047960Z return x.grad, w.grad 2025-12-04T12:12:57.6048177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6048287Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6048455Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6048672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6049004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6049110Z graph_break [] 2025-12-04T12:12:57.6049348Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6052026Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6052133Z return x.grad, w.grad 2025-12-04T12:12:57.6052844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6052954Z warnings.warn( 2025-12-04T12:12:57.6055581Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6055736Z return x.grad, w.grad 2025-12-04T12:12:57.6055953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6056074Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6056186Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6056402Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6056746Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6056843Z graph_break [] 2025-12-04T12:12:57.6057050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6057771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6057870Z warnings.warn( 2025-12-04T12:12:57.6060505Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6060612Z return x.grad, w.grad 2025-12-04T12:12:57.6061427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml - 2025-12-04T12:12:57.6061593Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6062638Z FAILED [0.1655s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6062690Z 2025-12-04T12:12:57.6062899Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6063914Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6063922Z 2025-12-04T12:12:57.6064196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6064370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6064580Z ============= 1 failed, 2 skipped, 38 deselected, 2 rerun in 4.97s ============= 2025-12-04T12:12:57.6064688Z Got exit code 1 2025-12-04T12:12:57.6064836Z Retrying single test... 2025-12-04T12:12:57.6065475Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml 2025-12-04T12:12:57.6065634Z ============================= test session starts ============================== 2025-12-04T12:12:57.6065973Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6066098Z cachedir: .pytest_cache 2025-12-04T12:12:57.6066607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6066769Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6066877Z configfile: pytest.ini 2025-12-04T12:12:57.6067451Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6067687Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6068687Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6068800Z Running 1 items in this shard 2025-12-04T12:12:57.6068816Z 2025-12-04T12:12:57.6069699Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5613s] [100%] 2025-12-04T12:12:57.6070580Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1662s] [100%] 2025-12-04T12:12:57.6071393Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1638s] [100%] 2025-12-04T12:12:57.6071399Z 2025-12-04T12:12:57.6071535Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6072088Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6072205Z Traceback (most recent call last): 2025-12-04T12:12:57.6072665Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6072870Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6073077Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6073081Z 2025-12-04T12:12:57.6073300Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6074222Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6074227Z 2025-12-04T12:12:57.6074487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6074762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6074872Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6074996Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6075356Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6075571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6075677Z graph_break [] 2025-12-04T12:12:57.6075887Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6078577Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6078685Z return x.grad, w.grad 2025-12-04T12:12:57.6079408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6079547Z warnings.warn( 2025-12-04T12:12:57.6082271Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6082394Z return x.grad, w.grad 2025-12-04T12:12:57.6082932Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6083077Z Traceback (most recent call last): 2025-12-04T12:12:57.6083537Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6083731Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6083949Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6083955Z 2025-12-04T12:12:57.6084165Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6085093Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6085100Z 2025-12-04T12:12:57.6085361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6085573Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6085703Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6085816Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6086146Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6086370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6086465Z graph_break [] 2025-12-04T12:12:57.6086691Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6089382Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6089531Z return x.grad, w.grad 2025-12-04T12:12:57.6090245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6090343Z warnings.warn( 2025-12-04T12:12:57.6093033Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6093171Z return x.grad, w.grad 2025-12-04T12:12:57.6093406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6093514Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6093624Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6093853Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6094187Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6094302Z graph_break [] 2025-12-04T12:12:57.6094514Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6097155Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6097276Z return x.grad, w.grad 2025-12-04T12:12:57.6097994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6098108Z warnings.warn( 2025-12-04T12:12:57.6100743Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6101043Z return x.grad, w.grad 2025-12-04T12:12:57.6101188Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6101730Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6101868Z Traceback (most recent call last): 2025-12-04T12:12:57.6102329Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6102617Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6102826Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6102831Z 2025-12-04T12:12:57.6103040Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6104020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6104027Z 2025-12-04T12:12:57.6104290Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6104518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6104667Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6104781Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6105125Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6105340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6105436Z graph_break [] 2025-12-04T12:12:57.6105663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6108307Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6108468Z return x.grad, w.grad 2025-12-04T12:12:57.6109187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6109299Z warnings.warn( 2025-12-04T12:12:57.6111937Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6112057Z return x.grad, w.grad 2025-12-04T12:12:57.6112268Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6112378Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6112504Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6112720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6113049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6113160Z graph_break [] 2025-12-04T12:12:57.6113373Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6116015Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6116149Z return x.grad, w.grad 2025-12-04T12:12:57.6116876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6117019Z warnings.warn( 2025-12-04T12:12:57.6119679Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6119796Z return x.grad, w.grad 2025-12-04T12:12:57.6120004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6120122Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6120235Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6120447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6120820Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6120916Z graph_break [] 2025-12-04T12:12:57.6121124Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6121850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6121952Z warnings.warn( 2025-12-04T12:12:57.6124664Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6124775Z return x.grad, w.grad 2025-12-04T12:12:57.6125589Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml - 2025-12-04T12:12:57.6125759Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6126806Z FAILED [0.1638s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6126829Z 2025-12-04T12:12:57.6127044Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6127969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6127976Z 2025-12-04T12:12:57.6128246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6128421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6128634Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.6128755Z Got exit code 1 2025-12-04T12:12:57.6128887Z Retrying single test... 2025-12-04T12:12:57.6129539Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml 2025-12-04T12:12:57.6129779Z ============================= test session starts ============================== 2025-12-04T12:12:57.6130150Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6130289Z cachedir: .pytest_cache 2025-12-04T12:12:57.6131471Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6131600Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6131705Z configfile: pytest.ini 2025-12-04T12:12:57.6132294Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6132546Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6133561Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6133675Z Running 1 items in this shard 2025-12-04T12:12:57.6133680Z 2025-12-04T12:12:57.6134568Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5443s] [100%] 2025-12-04T12:12:57.6135491Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1640s] [100%] 2025-12-04T12:12:57.6136291Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1612s] [100%] 2025-12-04T12:12:57.6136296Z 2025-12-04T12:12:57.6136446Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6136988Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6137123Z Traceback (most recent call last): 2025-12-04T12:12:57.6137586Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6137785Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6138007Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6138012Z 2025-12-04T12:12:57.6138223Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6139153Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6139158Z 2025-12-04T12:12:57.6139425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6139641Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6139767Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6139880Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6140219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6140449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6140546Z graph_break [] 2025-12-04T12:12:57.6140771Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6143416Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6143569Z return x.grad, w.grad 2025-12-04T12:12:57.6144321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6144421Z warnings.warn( 2025-12-04T12:12:57.6147087Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6147194Z return x.grad, w.grad 2025-12-04T12:12:57.6147748Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6147896Z Traceback (most recent call last): 2025-12-04T12:12:57.6148358Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6148567Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6148775Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6148780Z 2025-12-04T12:12:57.6149010Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6149925Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6149932Z 2025-12-04T12:12:57.6150202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6150418Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6150529Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6150654Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6150991Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6151204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6151313Z graph_break [] 2025-12-04T12:12:57.6151527Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6154174Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6154282Z return x.grad, w.grad 2025-12-04T12:12:57.6155000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6155108Z warnings.warn( 2025-12-04T12:12:57.6157771Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6157920Z return x.grad, w.grad 2025-12-04T12:12:57.6158132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6158254Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6158364Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6158585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6158953Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6159052Z graph_break [] 2025-12-04T12:12:57.6159262Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6161922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6162055Z return x.grad, w.grad 2025-12-04T12:12:57.6162858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6162961Z warnings.warn( 2025-12-04T12:12:57.6165608Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6165715Z return x.grad, w.grad 2025-12-04T12:12:57.6165872Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6166413Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6166532Z Traceback (most recent call last): 2025-12-04T12:12:57.6167007Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6167202Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6167409Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6167426Z 2025-12-04T12:12:57.6167637Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6168696Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6168702Z 2025-12-04T12:12:57.6169031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6169270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6169380Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6169506Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6169902Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6170132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6170228Z graph_break [] 2025-12-04T12:12:57.6170439Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6173172Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6173281Z return x.grad, w.grad 2025-12-04T12:12:57.6174014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6174112Z warnings.warn( 2025-12-04T12:12:57.6176775Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6176914Z return x.grad, w.grad 2025-12-04T12:12:57.6177125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6177249Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6177360Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6177589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6177925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6178024Z graph_break [] 2025-12-04T12:12:57.6178246Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6180901Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6181023Z return x.grad, w.grad 2025-12-04T12:12:57.6181741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6181840Z warnings.warn( 2025-12-04T12:12:57.6184500Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6184637Z return x.grad, w.grad 2025-12-04T12:12:57.6184863Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6184969Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6185096Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6185345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6185677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6185788Z graph_break [] 2025-12-04T12:12:57.6186002Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6186760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6186873Z warnings.warn( 2025-12-04T12:12:57.6189521Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6189677Z return x.grad, w.grad 2025-12-04T12:12:57.6190493Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml - 2025-12-04T12:12:57.6190681Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6191731Z FAILED [0.1612s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6191739Z 2025-12-04T12:12:57.6191954Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6192888Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6192896Z 2025-12-04T12:12:57.6193161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6193353Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6193551Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.6193648Z Got exit code 1 2025-12-04T12:12:57.6194501Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6194906Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6195553Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml 2025-12-04T12:12:57.6195717Z ============================= test session starts ============================== 2025-12-04T12:12:57.6196060Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6196183Z cachedir: .pytest_cache 2025-12-04T12:12:57.6196699Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6196838Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6196947Z configfile: pytest.ini 2025-12-04T12:12:57.6197557Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6197789Z collecting ... collected 380 items / 41 deselected / 339 selected 2025-12-04T12:12:57.6197931Z stepcurrent: skipping 41 already run items. 2025-12-04T12:12:57.6198075Z Running 134 items in this shard 2025-12-04T12:12:57.6198080Z 2025-12-04T12:12:57.6199103Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6200122Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.6201287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.6202239Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5402s] [ 2%] 2025-12-04T12:12:57.6203211Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1632s] [ 2%] 2025-12-04T12:12:57.6204019Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1588s] [ 2%] 2025-12-04T12:12:57.6204025Z 2025-12-04T12:12:57.6204184Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6204730Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6204852Z Traceback (most recent call last): 2025-12-04T12:12:57.6205336Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6205532Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6205741Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6205761Z 2025-12-04T12:12:57.6205975Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6206900Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6206905Z 2025-12-04T12:12:57.6207175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6207392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6207511Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6207623Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6207957Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6208188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6208283Z graph_break [] 2025-12-04T12:12:57.6208491Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6209226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6209326Z warnings.warn( 2025-12-04T12:12:57.6209888Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6210058Z Traceback (most recent call last): 2025-12-04T12:12:57.6210517Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6210725Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6210973Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6210978Z 2025-12-04T12:12:57.6211187Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6212118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6212123Z 2025-12-04T12:12:57.6212424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6212652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6212764Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6212873Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6213220Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6213437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6213598Z graph_break [] 2025-12-04T12:12:57.6213809Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6214528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6214640Z warnings.warn( 2025-12-04T12:12:57.6214848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6214956Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6215084Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6215299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6215640Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6215734Z graph_break [] 2025-12-04T12:12:57.6215941Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6216674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6216771Z warnings.warn( 2025-12-04T12:12:57.6216912Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6217475Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6217593Z Traceback (most recent call last): 2025-12-04T12:12:57.6218064Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6218259Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6218465Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6218470Z 2025-12-04T12:12:57.6218691Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6219623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6219629Z 2025-12-04T12:12:57.6219900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6220109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6220221Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6220344Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6220673Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6220918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6221025Z graph_break [] 2025-12-04T12:12:57.6221235Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6221993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6222097Z warnings.warn( 2025-12-04T12:12:57.6222307Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6222429Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6222543Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6222791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6223135Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6223230Z graph_break [] 2025-12-04T12:12:57.6223451Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6224167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6224294Z warnings.warn( 2025-12-04T12:12:57.6224515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6224623Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6224734Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6224960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6225289Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6225398Z graph_break [] 2025-12-04T12:12:57.6225607Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6226324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6226434Z warnings.warn( 2025-12-04T12:12:57.6227233Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml - 2025-12-04T12:12:57.6227413Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6228463Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6228469Z 2025-12-04T12:12:57.6228685Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6229614Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6229621Z 2025-12-04T12:12:57.6229879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6230072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6230286Z ============= 1 failed, 3 skipped, 41 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.6230380Z Got exit code 1 2025-12-04T12:12:57.6230494Z Retrying single test... 2025-12-04T12:12:57.6231125Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml 2025-12-04T12:12:57.6231300Z ============================= test session starts ============================== 2025-12-04T12:12:57.6231643Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6231793Z cachedir: .pytest_cache 2025-12-04T12:12:57.6232315Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6232434Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6232538Z configfile: pytest.ini 2025-12-04T12:12:57.6233157Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6233381Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6234431Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6234543Z Running 1 items in this shard 2025-12-04T12:12:57.6234548Z 2025-12-04T12:12:57.6235436Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5412s] [100%] 2025-12-04T12:12:57.6236335Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1621s] [100%] 2025-12-04T12:12:57.6237166Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1599s] [100%] 2025-12-04T12:12:57.6237172Z 2025-12-04T12:12:57.6237320Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6237865Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6237996Z Traceback (most recent call last): 2025-12-04T12:12:57.6238457Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6238651Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6238871Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6238876Z 2025-12-04T12:12:57.6239090Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6240007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6240024Z 2025-12-04T12:12:57.6240285Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6240499Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6240623Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6240735Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6241069Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6241296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6241391Z graph_break [] 2025-12-04T12:12:57.6241617Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6242415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6242513Z warnings.warn( 2025-12-04T12:12:57.6243072Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6243194Z Traceback (most recent call last): 2025-12-04T12:12:57.6243660Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6243912Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6244118Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6244123Z 2025-12-04T12:12:57.6244347Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6245299Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6245307Z 2025-12-04T12:12:57.6245582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6245794Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6245903Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6246062Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6246393Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6246610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6246717Z graph_break [] 2025-12-04T12:12:57.6246929Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6247643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6247784Z warnings.warn( 2025-12-04T12:12:57.6247993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6248114Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6248226Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6248439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6248780Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6248874Z graph_break [] 2025-12-04T12:12:57.6249083Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6249813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6249910Z warnings.warn( 2025-12-04T12:12:57.6250078Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6250625Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6250743Z Traceback (most recent call last): 2025-12-04T12:12:57.6251218Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6251419Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6251641Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6251649Z 2025-12-04T12:12:57.6251856Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6252780Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6252787Z 2025-12-04T12:12:57.6253066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6253280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6253408Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6253521Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6253853Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6254084Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6254181Z graph_break [] 2025-12-04T12:12:57.6254394Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6255162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6255264Z warnings.warn( 2025-12-04T12:12:57.6255518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6255630Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6255742Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6255974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6256305Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6256402Z graph_break [] 2025-12-04T12:12:57.6256675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6257391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6257504Z warnings.warn( 2025-12-04T12:12:57.6257713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6257823Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6257945Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6258197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6258524Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6258637Z graph_break [] 2025-12-04T12:12:57.6258844Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6259569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6259666Z warnings.warn( 2025-12-04T12:12:57.6260468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml - 2025-12-04T12:12:57.6260652Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6261712Z FAILED [0.1599s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6261719Z 2025-12-04T12:12:57.6261945Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6262869Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6262875Z 2025-12-04T12:12:57.6263134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6263331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6263525Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.6263632Z Got exit code 1 2025-12-04T12:12:57.6263736Z Retrying single test... 2025-12-04T12:12:57.6264363Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml 2025-12-04T12:12:57.6264531Z ============================= test session starts ============================== 2025-12-04T12:12:57.6264872Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6264976Z cachedir: .pytest_cache 2025-12-04T12:12:57.6265493Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6265616Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6265845Z configfile: pytest.ini 2025-12-04T12:12:57.6266419Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6266638Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6267686Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6267799Z Running 1 items in this shard 2025-12-04T12:12:57.6267804Z 2025-12-04T12:12:57.6268741Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5561s] [100%] 2025-12-04T12:12:57.6269627Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1623s] [100%] 2025-12-04T12:12:57.6270438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [100%] 2025-12-04T12:12:57.6270492Z 2025-12-04T12:12:57.6270631Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6271172Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6271305Z Traceback (most recent call last): 2025-12-04T12:12:57.6271768Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6271967Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6272184Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6272191Z 2025-12-04T12:12:57.6272398Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6273337Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6273344Z 2025-12-04T12:12:57.6273606Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6273837Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6273947Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6274060Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6274407Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6274625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6274722Z graph_break [] 2025-12-04T12:12:57.6274943Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6275658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6275771Z warnings.warn( 2025-12-04T12:12:57.6276319Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6276438Z Traceback (most recent call last): 2025-12-04T12:12:57.6276906Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6277098Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6277305Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6277310Z 2025-12-04T12:12:57.6277530Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6278501Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6278506Z 2025-12-04T12:12:57.6278805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6279021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6279129Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6279253Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6279584Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6279809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6279934Z graph_break [] 2025-12-04T12:12:57.6280145Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6280878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6280976Z warnings.warn( 2025-12-04T12:12:57.6281184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6281339Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6281452Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6281682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6282010Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6282104Z graph_break [] 2025-12-04T12:12:57.6282404Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6283122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6283222Z warnings.warn( 2025-12-04T12:12:57.6283377Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6283924Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6284064Z Traceback (most recent call last): 2025-12-04T12:12:57.6284528Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6284722Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6284943Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6284948Z 2025-12-04T12:12:57.6285157Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6286096Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6286104Z 2025-12-04T12:12:57.6286364Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6286573Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6286695Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6286809Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6287141Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6287365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6287458Z graph_break [] 2025-12-04T12:12:57.6287680Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6288398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6288541Z warnings.warn( 2025-12-04T12:12:57.6288762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6288868Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6288977Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6289209Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6289568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6289674Z graph_break [] 2025-12-04T12:12:57.6289884Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6290596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6290738Z warnings.warn( 2025-12-04T12:12:57.6290948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6291058Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6291180Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6291392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6291733Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6291826Z graph_break [] 2025-12-04T12:12:57.6292069Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6292788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6292883Z warnings.warn( 2025-12-04T12:12:57.6293680Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml - 2025-12-04T12:12:57.6293861Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6294911Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6294917Z 2025-12-04T12:12:57.6295140Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6296064Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6296069Z 2025-12-04T12:12:57.6296339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6296516Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6296709Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.6296819Z Got exit code 1 2025-12-04T12:12:57.6297655Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6298070Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6298697Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml 2025-12-04T12:12:57.6298855Z ============================= test session starts ============================== 2025-12-04T12:12:57.6299207Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6299314Z cachedir: .pytest_cache 2025-12-04T12:12:57.6299823Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6299997Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6300104Z configfile: pytest.ini 2025-12-04T12:12:57.6300698Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6301154Z collecting ... collected 380 items / 45 deselected / 335 selected 2025-12-04T12:12:57.6301301Z stepcurrent: skipping 45 already run items. 2025-12-04T12:12:57.6301426Z Running 130 items in this shard 2025-12-04T12:12:57.6301431Z 2025-12-04T12:12:57.6302438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6303492Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.6304483Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.6305387Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5642s] [ 3%] 2025-12-04T12:12:57.6306311Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [ 3%] 2025-12-04T12:12:57.6307121Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1575s] [ 3%] 2025-12-04T12:12:57.6307140Z 2025-12-04T12:12:57.6307277Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6307820Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6307951Z Traceback (most recent call last): 2025-12-04T12:12:57.6308415Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6308610Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6308828Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6308833Z 2025-12-04T12:12:57.6309042Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6309981Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6309988Z 2025-12-04T12:12:57.6310250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6310463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6310586Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6310704Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6311050Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6311262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6311358Z graph_break [] 2025-12-04T12:12:57.6311579Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6312306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6312404Z warnings.warn( 2025-12-04T12:12:57.6313008Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6313128Z Traceback (most recent call last): 2025-12-04T12:12:57.6313602Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6313833Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6314041Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6314046Z 2025-12-04T12:12:57.6314271Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6315225Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6315231Z 2025-12-04T12:12:57.6315504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6315720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6315831Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6315958Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6316293Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6316540Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6316650Z graph_break [] 2025-12-04T12:12:57.6316860Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6317594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6317697Z warnings.warn( 2025-12-04T12:12:57.6317907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6318035Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6318148Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6318363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6318703Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6318798Z graph_break [] 2025-12-04T12:12:57.6319026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6319745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6319845Z warnings.warn( 2025-12-04T12:12:57.6320003Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6320544Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6320662Z Traceback (most recent call last): 2025-12-04T12:12:57.6321137Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6321329Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6321549Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6321559Z 2025-12-04T12:12:57.6321765Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6322760Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6322782Z 2025-12-04T12:12:57.6323042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6323256Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6323382Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6323537Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6323866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6324098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6324194Z graph_break [] 2025-12-04T12:12:57.6324436Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6325167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6325264Z warnings.warn( 2025-12-04T12:12:57.6325483Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6325590Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6325732Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6325963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6326292Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6326385Z graph_break [] 2025-12-04T12:12:57.6326606Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6327315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6327457Z warnings.warn( 2025-12-04T12:12:57.6327662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6327769Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6327890Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6328102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6328433Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6328538Z graph_break [] 2025-12-04T12:12:57.6328746Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6329468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6329566Z warnings.warn( 2025-12-04T12:12:57.6330368Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml - 2025-12-04T12:12:57.6330548Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6331611Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6331617Z 2025-12-04T12:12:57.6331843Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6332763Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6332769Z 2025-12-04T12:12:57.6333041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6333219Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6333429Z ============= 1 failed, 3 skipped, 45 deselected, 2 rerun in 4.95s ============= 2025-12-04T12:12:57.6333538Z Got exit code 1 2025-12-04T12:12:57.6333641Z Retrying single test... 2025-12-04T12:12:57.6334270Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml 2025-12-04T12:12:57.6334441Z ============================= test session starts ============================== 2025-12-04T12:12:57.6334821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6334942Z cachedir: .pytest_cache 2025-12-04T12:12:57.6335452Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6335602Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6335722Z configfile: pytest.ini 2025-12-04T12:12:57.6336297Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6336518Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6337561Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6337674Z Running 1 items in this shard 2025-12-04T12:12:57.6337681Z 2025-12-04T12:12:57.6338582Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5368s] [100%] 2025-12-04T12:12:57.6339470Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%] 2025-12-04T12:12:57.6340324Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1588s] [100%] 2025-12-04T12:12:57.6340329Z 2025-12-04T12:12:57.6340469Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6341013Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6341148Z Traceback (most recent call last): 2025-12-04T12:12:57.6341609Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6341816Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6342026Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6342031Z 2025-12-04T12:12:57.6342237Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6343172Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6343176Z 2025-12-04T12:12:57.6343440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6343667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6343777Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6343889Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6344237Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6344454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6344554Z graph_break [] 2025-12-04T12:12:57.6344782Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6345503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6345617Z warnings.warn( 2025-12-04T12:12:57.6346174Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6346293Z Traceback (most recent call last): 2025-12-04T12:12:57.6346768Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6347020Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6347238Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6347243Z 2025-12-04T12:12:57.6347480Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6348413Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6348418Z 2025-12-04T12:12:57.6348689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6348927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6349049Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6349161Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6349494Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6349718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6349815Z graph_break [] 2025-12-04T12:12:57.6350028Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6350798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6350897Z warnings.warn( 2025-12-04T12:12:57.6351117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6351223Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6351335Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6351561Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6351891Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6351988Z graph_break [] 2025-12-04T12:12:57.6352206Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6352920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6353033Z warnings.warn( 2025-12-04T12:12:57.6353173Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6353717Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6353847Z Traceback (most recent call last): 2025-12-04T12:12:57.6354304Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6354496Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6354714Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6354719Z 2025-12-04T12:12:57.6354925Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6355858Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6355865Z 2025-12-04T12:12:57.6356122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6356333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6356453Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6356562Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6356903Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6357119Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6357249Z graph_break [] 2025-12-04T12:12:57.6357468Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6358187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6358317Z warnings.warn( 2025-12-04T12:12:57.6358540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6358647Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6358772Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6358986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6359346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6359456Z graph_break [] 2025-12-04T12:12:57.6359666Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6360380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6360495Z warnings.warn( 2025-12-04T12:12:57.6360705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6360860Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6360970Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6361185Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6361531Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6361627Z graph_break [] 2025-12-04T12:12:57.6361839Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6362633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6362735Z warnings.warn( 2025-12-04T12:12:57.6363540Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml - 2025-12-04T12:12:57.6363711Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6364771Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6364792Z 2025-12-04T12:12:57.6365002Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6365919Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6365927Z 2025-12-04T12:12:57.6366199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6366372Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6366564Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.6366678Z Got exit code 1 2025-12-04T12:12:57.6366782Z Retrying single test... 2025-12-04T12:12:57.6367420Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml 2025-12-04T12:12:57.6367576Z ============================= test session starts ============================== 2025-12-04T12:12:57.6367917Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6368037Z cachedir: .pytest_cache 2025-12-04T12:12:57.6368546Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6368735Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6368840Z configfile: pytest.ini 2025-12-04T12:12:57.6369415Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6369683Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6370692Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6370806Z Running 1 items in this shard 2025-12-04T12:12:57.6370826Z 2025-12-04T12:12:57.6371747Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5476s] [100%] 2025-12-04T12:12:57.6372639Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1611s] [100%] 2025-12-04T12:12:57.6373466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1606s] [100%] 2025-12-04T12:12:57.6373522Z 2025-12-04T12:12:57.6373663Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6374224Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6374346Z Traceback (most recent call last): 2025-12-04T12:12:57.6374810Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6375023Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6375229Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6375234Z 2025-12-04T12:12:57.6375458Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6376386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6376393Z 2025-12-04T12:12:57.6376652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6376881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6376993Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6377122Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6377455Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6377673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6377785Z graph_break [] 2025-12-04T12:12:57.6377996Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6378721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6378838Z warnings.warn( 2025-12-04T12:12:57.6379386Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6379521Z Traceback (most recent call last): 2025-12-04T12:12:57.6379983Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6380176Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6380392Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6380429Z 2025-12-04T12:12:57.6380638Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6381599Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6381608Z 2025-12-04T12:12:57.6381869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6382080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6382201Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6382317Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6382676Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6382910Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6383010Z graph_break [] 2025-12-04T12:12:57.6383240Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6383959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6384060Z warnings.warn( 2025-12-04T12:12:57.6384336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6384442Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6384556Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6384786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6385111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6385221Z graph_break [] 2025-12-04T12:12:57.6385434Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6386147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6386258Z warnings.warn( 2025-12-04T12:12:57.6386395Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6386944Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6387077Z Traceback (most recent call last): 2025-12-04T12:12:57.6387539Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6387742Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6387946Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6387953Z 2025-12-04T12:12:57.6388157Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6389092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6389099Z 2025-12-04T12:12:57.6389361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6389588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6389701Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6389813Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6390151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6390362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6390468Z graph_break [] 2025-12-04T12:12:57.6390680Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6391393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6391541Z warnings.warn( 2025-12-04T12:12:57.6391748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6391856Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6392012Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6392230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6392571Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6392667Z graph_break [] 2025-12-04T12:12:57.6392875Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6393623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6393722Z warnings.warn( 2025-12-04T12:12:57.6393932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6394053Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6394164Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6394376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6394718Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6394846Z graph_break [] 2025-12-04T12:12:57.6395067Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6395773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6395869Z warnings.warn( 2025-12-04T12:12:57.6396678Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml - 2025-12-04T12:12:57.6396847Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6397911Z FAILED [0.1606s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6397919Z 2025-12-04T12:12:57.6398131Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6399050Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6399069Z 2025-12-04T12:12:57.6399331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6399506Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6399717Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.6399813Z Got exit code 1 2025-12-04T12:12:57.6400651Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6401236Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6401861Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml 2025-12-04T12:12:57.6402040Z ============================= test session starts ============================== 2025-12-04T12:12:57.6402480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6402591Z cachedir: .pytest_cache 2025-12-04T12:12:57.6403114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6403317Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6403439Z configfile: pytest.ini 2025-12-04T12:12:57.6404059Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6404284Z collecting ... collected 380 items / 49 deselected / 331 selected 2025-12-04T12:12:57.6404439Z stepcurrent: skipping 49 already run items. 2025-12-04T12:12:57.6404553Z Running 126 items in this shard 2025-12-04T12:12:57.6404558Z 2025-12-04T12:12:57.6405597Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6406604Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.6407598Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.6408532Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5452s] [ 3%] 2025-12-04T12:12:57.6409418Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [ 3%] 2025-12-04T12:12:57.6410234Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1589s] [ 3%] 2025-12-04T12:12:57.6410242Z 2025-12-04T12:12:57.6410379Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6410939Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6411059Z Traceback (most recent call last): 2025-12-04T12:12:57.6411524Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6411727Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6411933Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6411939Z 2025-12-04T12:12:57.6412164Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6413083Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6413090Z 2025-12-04T12:12:57.6413349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6413577Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6413685Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6413798Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6414144Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6414360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6414468Z graph_break [] 2025-12-04T12:12:57.6414678Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6415394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6415537Z warnings.warn( 2025-12-04T12:12:57.6416087Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6416218Z Traceback (most recent call last): 2025-12-04T12:12:57.6416706Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6416901Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6417117Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6417122Z 2025-12-04T12:12:57.6417329Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6418277Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6418300Z 2025-12-04T12:12:57.6418558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6418769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6418886Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6418997Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6419357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6419584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6419679Z graph_break [] 2025-12-04T12:12:57.6419902Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6420623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6420721Z warnings.warn( 2025-12-04T12:12:57.6420946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6421052Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6421165Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6421390Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6421715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6421823Z graph_break [] 2025-12-04T12:12:57.6422034Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6422743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6422852Z warnings.warn( 2025-12-04T12:12:57.6422997Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6423542Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6423676Z Traceback (most recent call last): 2025-12-04T12:12:57.6424140Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6424348Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6424559Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6424564Z 2025-12-04T12:12:57.6424771Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6425704Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6425711Z 2025-12-04T12:12:57.6425973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6426193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6426337Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6426449Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6439126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6439632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6439746Z graph_break [] 2025-12-04T12:12:57.6439992Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6440726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6440827Z warnings.warn( 2025-12-04T12:12:57.6441132Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6441245Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6441375Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6441601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6441935Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6442048Z graph_break [] 2025-12-04T12:12:57.6442363Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6443127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6443247Z warnings.warn( 2025-12-04T12:12:57.6443458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6443586Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6443702Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6443917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6444264Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6444365Z graph_break [] 2025-12-04T12:12:57.6444577Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6445306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6445408Z warnings.warn( 2025-12-04T12:12:57.6446219Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml - 2025-12-04T12:12:57.6446388Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6447456Z FAILED [0.1589s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6447477Z 2025-12-04T12:12:57.6447688Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6448610Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6448619Z 2025-12-04T12:12:57.6448897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6449074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6449284Z ============= 1 failed, 3 skipped, 49 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.6449390Z Got exit code 1 2025-12-04T12:12:57.6449495Z Retrying single test... 2025-12-04T12:12:57.6450128Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml 2025-12-04T12:12:57.6450333Z ============================= test session starts ============================== 2025-12-04T12:12:57.6450676Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6450797Z cachedir: .pytest_cache 2025-12-04T12:12:57.6451334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6451459Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6451580Z configfile: pytest.ini 2025-12-04T12:12:57.6452158Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6452395Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6453425Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6453546Z Running 1 items in this shard 2025-12-04T12:12:57.6453551Z 2025-12-04T12:12:57.6454462Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [100%] 2025-12-04T12:12:57.6455450Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1629s] [100%] 2025-12-04T12:12:57.6456270Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1593s] [100%] 2025-12-04T12:12:57.6456275Z 2025-12-04T12:12:57.6456413Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6456972Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6457093Z Traceback (most recent call last): 2025-12-04T12:12:57.6457551Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6457762Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6457972Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6457977Z 2025-12-04T12:12:57.6458183Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6459118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6459123Z 2025-12-04T12:12:57.6459384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6459610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6459719Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6459832Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6460178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6460395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6460503Z graph_break [] 2025-12-04T12:12:57.6460714Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6461432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6461546Z warnings.warn( 2025-12-04T12:12:57.6462088Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6462238Z Traceback (most recent call last): 2025-12-04T12:12:57.6462709Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6462903Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6463151Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6463159Z 2025-12-04T12:12:57.6463371Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6464293Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6464311Z 2025-12-04T12:12:57.6464599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6464812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6464937Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6465049Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6465380Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6465609Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6465737Z graph_break [] 2025-12-04T12:12:57.6465946Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6466676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6466774Z warnings.warn( 2025-12-04T12:12:57.6467007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6467168Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6467327Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6467575Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6467960Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6468058Z graph_break [] 2025-12-04T12:12:57.6468278Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6468994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6469108Z warnings.warn( 2025-12-04T12:12:57.6469246Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6469787Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6469921Z Traceback (most recent call last): 2025-12-04T12:12:57.6470380Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6470591Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6470796Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6470801Z 2025-12-04T12:12:57.6471010Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6471944Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6471951Z 2025-12-04T12:12:57.6472209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6472433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6472544Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6472657Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6473003Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6473290Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6473384Z graph_break [] 2025-12-04T12:12:57.6473608Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6474355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6474472Z warnings.warn( 2025-12-04T12:12:57.6474680Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6474786Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6474913Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6475156Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6475486Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6475594Z graph_break [] 2025-12-04T12:12:57.6475802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6476522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6476621Z warnings.warn( 2025-12-04T12:12:57.6476864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6476983Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6477093Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6477304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6477643Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6477740Z graph_break [] 2025-12-04T12:12:57.6477961Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6478671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6478769Z warnings.warn( 2025-12-04T12:12:57.6479577Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml - 2025-12-04T12:12:57.6479746Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6480815Z FAILED [0.1593s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6480822Z 2025-12-04T12:12:57.6481038Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6481964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6481971Z 2025-12-04T12:12:57.6482320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6482500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6482715Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.6482816Z Got exit code 1 2025-12-04T12:12:57.6482922Z Retrying single test... 2025-12-04T12:12:57.6483561Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml 2025-12-04T12:12:57.6483722Z ============================= test session starts ============================== 2025-12-04T12:12:57.6484068Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6484236Z cachedir: .pytest_cache 2025-12-04T12:12:57.6484751Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6484887Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6484996Z configfile: pytest.ini 2025-12-04T12:12:57.6485623Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6485867Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6486873Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6487031Z Running 1 items in this shard 2025-12-04T12:12:57.6487037Z 2025-12-04T12:12:57.6487923Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5652s] [100%] 2025-12-04T12:12:57.6488805Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1628s] [100%] 2025-12-04T12:12:57.6489650Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1607s] [100%] 2025-12-04T12:12:57.6489655Z 2025-12-04T12:12:57.6489792Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6490347Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6490467Z Traceback (most recent call last): 2025-12-04T12:12:57.6490931Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6491138Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6491344Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6491349Z 2025-12-04T12:12:57.6491568Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6492490Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6492495Z 2025-12-04T12:12:57.6492771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6492985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6493097Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6493221Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6493556Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6493769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6493877Z graph_break [] 2025-12-04T12:12:57.6494083Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6494819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6494920Z warnings.warn( 2025-12-04T12:12:57.6495468Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6495602Z Traceback (most recent call last): 2025-12-04T12:12:57.6496061Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6496251Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6496513Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6496518Z 2025-12-04T12:12:57.6496726Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6497689Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6497697Z 2025-12-04T12:12:57.6497957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6498167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6498288Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6498429Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6498775Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6498990Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6499084Z graph_break [] 2025-12-04T12:12:57.6499309Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6500026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6500153Z warnings.warn( 2025-12-04T12:12:57.6500376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6500485Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6500611Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6501035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6501421Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6501533Z graph_break [] 2025-12-04T12:12:57.6501743Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6502458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6502572Z warnings.warn( 2025-12-04T12:12:57.6502713Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6503271Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6503393Z Traceback (most recent call last): 2025-12-04T12:12:57.6503852Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6504062Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6504270Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6504276Z 2025-12-04T12:12:57.6504496Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6505415Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6505420Z 2025-12-04T12:12:57.6505686Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6505911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6506019Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6506136Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6506478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6506692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6506804Z graph_break [] 2025-12-04T12:12:57.6507013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6507812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6507923Z warnings.warn( 2025-12-04T12:12:57.6508176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6508289Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6508414Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6508627Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6508966Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6509062Z graph_break [] 2025-12-04T12:12:57.6509320Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6510048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6510149Z warnings.warn( 2025-12-04T12:12:57.6510355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6510474Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6510584Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6510854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6511184Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6511280Z graph_break [] 2025-12-04T12:12:57.6511501Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6512213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6512310Z warnings.warn( 2025-12-04T12:12:57.6513115Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml - 2025-12-04T12:12:57.6513284Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6514349Z FAILED [0.1607s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6514357Z 2025-12-04T12:12:57.6514568Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6515508Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6515514Z 2025-12-04T12:12:57.6515776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6515952Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6516158Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.6516254Z Got exit code 1 2025-12-04T12:12:57.6517106Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6517507Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6518132Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml 2025-12-04T12:12:57.6518305Z ============================= test session starts ============================== 2025-12-04T12:12:57.6518646Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6518799Z cachedir: .pytest_cache 2025-12-04T12:12:57.6519311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6519434Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6519552Z configfile: pytest.ini 2025-12-04T12:12:57.6520159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6520384Z collecting ... collected 380 items / 53 deselected / 327 selected 2025-12-04T12:12:57.6520536Z stepcurrent: skipping 53 already run items. 2025-12-04T12:12:57.6520648Z Running 122 items in this shard 2025-12-04T12:12:57.6520653Z 2025-12-04T12:12:57.6521700Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6522659Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5633s] [ 1%] 2025-12-04T12:12:57.6523545Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1611s] [ 1%] 2025-12-04T12:12:57.6524401Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1614s] [ 1%] 2025-12-04T12:12:57.6524407Z 2025-12-04T12:12:57.6524551Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6525111Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6525234Z Traceback (most recent call last): 2025-12-04T12:12:57.6525714Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6525910Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6526123Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6526130Z 2025-12-04T12:12:57.6526355Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6527278Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6527283Z 2025-12-04T12:12:57.6527558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6527772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6527884Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6528011Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6528344Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6528559Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6528669Z graph_break [] 2025-12-04T12:12:57.6528881Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6529611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6529711Z warnings.warn( 2025-12-04T12:12:57.6530267Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6530397Z Traceback (most recent call last): 2025-12-04T12:12:57.6530855Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6531118Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6531333Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6531339Z 2025-12-04T12:12:57.6531547Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6532520Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6532526Z 2025-12-04T12:12:57.6532786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6533027Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6533150Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6533264Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6533610Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6533825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6533919Z graph_break [] 2025-12-04T12:12:57.6534142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6534894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6534991Z warnings.warn( 2025-12-04T12:12:57.6535210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6535322Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6535444Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6535660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6535987Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6536095Z graph_break [] 2025-12-04T12:12:57.6536304Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6537014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6537126Z warnings.warn( 2025-12-04T12:12:57.6537267Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6537821Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6537939Z Traceback (most recent call last): 2025-12-04T12:12:57.6538401Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6538601Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6538807Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6538813Z 2025-12-04T12:12:57.6539034Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6539956Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6539963Z 2025-12-04T12:12:57.6540224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6540444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6540552Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6540676Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6541005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6541223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6541368Z graph_break [] 2025-12-04T12:12:57.6541578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6542296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6542442Z warnings.warn( 2025-12-04T12:12:57.6542653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6542775Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6542890Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6543104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6543446Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6543573Z graph_break [] 2025-12-04T12:12:57.6543781Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6544503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6544601Z warnings.warn( 2025-12-04T12:12:57.6544823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6544969Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6545080Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6545308Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6545635Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6545730Z graph_break [] 2025-12-04T12:12:57.6545949Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6546660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6546769Z warnings.warn( 2025-12-04T12:12:57.6547572Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml - 2025-12-04T12:12:57.6547741Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6548803Z FAILED [0.1614s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6548808Z 2025-12-04T12:12:57.6549020Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6549963Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6549971Z 2025-12-04T12:12:57.6550230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6550405Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6550625Z ============= 1 failed, 1 skipped, 53 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:57.6550727Z Got exit code 1 2025-12-04T12:12:57.6550844Z Retrying single test... 2025-12-04T12:12:57.6551476Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml 2025-12-04T12:12:57.6551638Z ============================= test session starts ============================== 2025-12-04T12:12:57.6551996Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6552104Z cachedir: .pytest_cache 2025-12-04T12:12:57.6552610Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6552783Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6552888Z configfile: pytest.ini 2025-12-04T12:12:57.6553473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6553729Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6554732Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6554858Z Running 1 items in this shard 2025-12-04T12:12:57.6554863Z 2025-12-04T12:12:57.6555790Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5540s] [100%] 2025-12-04T12:12:57.6556698Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%] 2025-12-04T12:12:57.6557508Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%] 2025-12-04T12:12:57.6557549Z 2025-12-04T12:12:57.6557700Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6558250Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6558369Z Traceback (most recent call last): 2025-12-04T12:12:57.6558843Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6559051Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6559268Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6559273Z 2025-12-04T12:12:57.6559481Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6560402Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6560409Z 2025-12-04T12:12:57.6560683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6560898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6561024Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6561139Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6561470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6561700Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6561800Z graph_break [] 2025-12-04T12:12:57.6562010Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6562819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6562921Z warnings.warn( 2025-12-04T12:12:57.6563477Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6563598Z Traceback (most recent call last): 2025-12-04T12:12:57.6564058Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6564270Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6564476Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6564525Z 2025-12-04T12:12:57.6564751Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6565676Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6565725Z 2025-12-04T12:12:57.6565991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6566216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6566327Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6566451Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6566785Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6567029Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6567144Z graph_break [] 2025-12-04T12:12:57.6567359Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6568080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6568195Z warnings.warn( 2025-12-04T12:12:57.6568406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6568575Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6568691Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6568907Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6569244Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6569338Z graph_break [] 2025-12-04T12:12:57.6569551Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6570273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6570377Z warnings.warn( 2025-12-04T12:12:57.6570529Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6571073Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6571193Z Traceback (most recent call last): 2025-12-04T12:12:57.6571662Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6571855Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6572058Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6572063Z 2025-12-04T12:12:57.6572283Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6573200Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6573208Z 2025-12-04T12:12:57.6573478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6573690Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6573799Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6573923Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6574251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6574473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6574568Z graph_break [] 2025-12-04T12:12:57.6574778Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6575504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6575646Z warnings.warn( 2025-12-04T12:12:57.6575852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6575973Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6576085Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6576341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6576669Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6576760Z graph_break [] 2025-12-04T12:12:57.6576982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6577726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6577827Z warnings.warn( 2025-12-04T12:12:57.6578052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6578160Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6578286Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6578502Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6578830Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6578970Z graph_break [] 2025-12-04T12:12:57.6579177Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6579886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6579996Z warnings.warn( 2025-12-04T12:12:57.6580797Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml - 2025-12-04T12:12:57.6580982Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6582035Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6582046Z 2025-12-04T12:12:57.6582257Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6583194Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6583200Z 2025-12-04T12:12:57.6583461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6583647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6583840Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.6583935Z Got exit code 1 2025-12-04T12:12:57.6584049Z Retrying single test... 2025-12-04T12:12:57.6584677Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml 2025-12-04T12:12:57.6584850Z ============================= test session starts ============================== 2025-12-04T12:12:57.6585190Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6585295Z cachedir: .pytest_cache 2025-12-04T12:12:57.6585817Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6585939Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6586045Z configfile: pytest.ini 2025-12-04T12:12:57.6586635Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6586897Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6587944Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6588059Z Running 1 items in this shard 2025-12-04T12:12:57.6588064Z 2025-12-04T12:12:57.6588966Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5363s] [100%] 2025-12-04T12:12:57.6589882Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [100%] 2025-12-04T12:12:57.6590696Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [100%] 2025-12-04T12:12:57.6590701Z 2025-12-04T12:12:57.6590849Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6591430Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6591569Z Traceback (most recent call last): 2025-12-04T12:12:57.6592032Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6592228Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6592456Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6592460Z 2025-12-04T12:12:57.6592671Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6593608Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6593613Z 2025-12-04T12:12:57.6593876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6594101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6594225Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6594339Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6594683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6594901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6595001Z graph_break [] 2025-12-04T12:12:57.6595229Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6595947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6596047Z warnings.warn( 2025-12-04T12:12:57.6596606Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6596731Z Traceback (most recent call last): 2025-12-04T12:12:57.6597203Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6597396Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6597604Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6597609Z 2025-12-04T12:12:57.6597833Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6598750Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6598797Z 2025-12-04T12:12:57.6599070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6599283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6599422Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6599548Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6599877Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6600089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6600194Z graph_break [] 2025-12-04T12:12:57.6600405Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6601427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6601532Z warnings.warn( 2025-12-04T12:12:57.6601742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6601864Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6601975Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6602256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6602655Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6602748Z graph_break [] 2025-12-04T12:12:57.6602970Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6603684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6603780Z warnings.warn( 2025-12-04T12:12:57.6603934Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6604478Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6604599Z Traceback (most recent call last): 2025-12-04T12:12:57.6605068Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6605263Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6605480Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6605485Z 2025-12-04T12:12:57.6605696Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6606614Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6606632Z 2025-12-04T12:12:57.6606891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6607100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6607220Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6607334Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6607665Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6607893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6607988Z graph_break [] 2025-12-04T12:12:57.6608196Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6608923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6609026Z warnings.warn( 2025-12-04T12:12:57.6609247Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6609413Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6609524Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6609747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6610074Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6610169Z graph_break [] 2025-12-04T12:12:57.6610433Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6611148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6611257Z warnings.warn( 2025-12-04T12:12:57.6611464Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6611617Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6611742Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6611956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6612282Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6612391Z graph_break [] 2025-12-04T12:12:57.6612600Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6613323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6613452Z warnings.warn( 2025-12-04T12:12:57.6614259Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml - 2025-12-04T12:12:57.6614438Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6615495Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6615503Z 2025-12-04T12:12:57.6615731Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6616653Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6616661Z 2025-12-04T12:12:57.6616933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6617108Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6617304Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.6617418Z Got exit code 1 2025-12-04T12:12:57.6618251Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6618655Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6619292Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml 2025-12-04T12:12:57.6619454Z ============================= test session starts ============================== 2025-12-04T12:12:57.6619806Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6619914Z cachedir: .pytest_cache 2025-12-04T12:12:57.6620420Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6620555Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6620661Z configfile: pytest.ini 2025-12-04T12:12:57.6621250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6621509Z collecting ... collected 380 items / 55 deselected / 325 selected 2025-12-04T12:12:57.6621648Z stepcurrent: skipping 55 already run items. 2025-12-04T12:12:57.6621774Z Running 120 items in this shard 2025-12-04T12:12:57.6621781Z 2025-12-04T12:12:57.6622818Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6623860Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.6624740Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5466s] [ 2%] 2025-12-04T12:12:57.6625618Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1625s] [ 2%] 2025-12-04T12:12:57.6626471Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1607s] [ 2%] 2025-12-04T12:12:57.6626477Z 2025-12-04T12:12:57.6626616Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6627176Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6627298Z Traceback (most recent call last): 2025-12-04T12:12:57.6627761Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6627973Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6628180Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6628185Z 2025-12-04T12:12:57.6628409Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6629334Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6629339Z 2025-12-04T12:12:57.6629611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6629831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6629941Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6630066Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6630395Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6630610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6630718Z graph_break [] 2025-12-04T12:12:57.6630926Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6633592Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6633695Z return x.grad, w.grad 2025-12-04T12:12:57.6634446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6634555Z warnings.warn( 2025-12-04T12:12:57.6637230Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6637459Z return x.grad, w.grad 2025-12-04T12:12:57.6638001Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6638138Z Traceback (most recent call last): 2025-12-04T12:12:57.6638592Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6638787Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6639036Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6639042Z 2025-12-04T12:12:57.6639249Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6640185Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6640191Z 2025-12-04T12:12:57.6640453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6640664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6640788Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6640900Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6641250Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6641461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6641561Z graph_break [] 2025-12-04T12:12:57.6641780Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6644495Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6644616Z return x.grad, w.grad 2025-12-04T12:12:57.6645337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6645436Z warnings.warn( 2025-12-04T12:12:57.6648099Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6648244Z return x.grad, w.grad 2025-12-04T12:12:57.6648471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6648581Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6648692Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6648953Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6649288Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6649394Z graph_break [] 2025-12-04T12:12:57.6649603Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6652269Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6652389Z return x.grad, w.grad 2025-12-04T12:12:57.6653136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6653248Z warnings.warn( 2025-12-04T12:12:57.6655887Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6656004Z return x.grad, w.grad 2025-12-04T12:12:57.6656145Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6656685Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6656814Z Traceback (most recent call last): 2025-12-04T12:12:57.6657272Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6657476Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6657682Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6657688Z 2025-12-04T12:12:57.6657893Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6658832Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6658837Z 2025-12-04T12:12:57.6659099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6659322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6659430Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6659540Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6659884Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6660098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6660207Z graph_break [] 2025-12-04T12:12:57.6660417Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6663180Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6663300Z return x.grad, w.grad 2025-12-04T12:12:57.6664044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6664157Z warnings.warn( 2025-12-04T12:12:57.6666785Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6666930Z return x.grad, w.grad 2025-12-04T12:12:57.6667141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6667248Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6667374Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6667593Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6667924Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6668038Z graph_break [] 2025-12-04T12:12:57.6668249Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6670888Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6670994Z return x.grad, w.grad 2025-12-04T12:12:57.6671721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6671819Z warnings.warn( 2025-12-04T12:12:57.6674457Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6674573Z return x.grad, w.grad 2025-12-04T12:12:57.6674781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6674903Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6675013Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6675231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6675613Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6675708Z graph_break [] 2025-12-04T12:12:57.6675927Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6676668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6676769Z warnings.warn( 2025-12-04T12:12:57.6679448Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6679555Z return x.grad, w.grad 2025-12-04T12:12:57.6680370Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml - 2025-12-04T12:12:57.6680568Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6681624Z FAILED [0.1607s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6681629Z 2025-12-04T12:12:57.6681840Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6682822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6682832Z 2025-12-04T12:12:57.6683107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6683284Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6683511Z ============= 1 failed, 2 skipped, 55 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.6683605Z Got exit code 1 2025-12-04T12:12:57.6683709Z Retrying single test... 2025-12-04T12:12:57.6684344Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml 2025-12-04T12:12:57.6684503Z ============================= test session starts ============================== 2025-12-04T12:12:57.6684845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6684965Z cachedir: .pytest_cache 2025-12-04T12:12:57.6685471Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6685601Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6685705Z configfile: pytest.ini 2025-12-04T12:12:57.6686281Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6686515Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6687522Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6687647Z Running 1 items in this shard 2025-12-04T12:12:57.6687652Z 2025-12-04T12:12:57.6688537Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5642s] [100%] 2025-12-04T12:12:57.6689484Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1645s] [100%] 2025-12-04T12:12:57.6690294Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1644s] [100%] 2025-12-04T12:12:57.6690300Z 2025-12-04T12:12:57.6690436Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6691016Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6691135Z Traceback (most recent call last): 2025-12-04T12:12:57.6691610Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6691802Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6692009Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6692014Z 2025-12-04T12:12:57.6692236Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6693187Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6693192Z 2025-12-04T12:12:57.6693467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6693681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6693790Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6693921Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6694259Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6694473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6694582Z graph_break [] 2025-12-04T12:12:57.6694794Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6697461Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6697568Z return x.grad, w.grad 2025-12-04T12:12:57.6698298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6698396Z warnings.warn( 2025-12-04T12:12:57.6701292Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6701418Z return x.grad, w.grad 2025-12-04T12:12:57.6701956Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6702163Z Traceback (most recent call last): 2025-12-04T12:12:57.6702622Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6702856Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6703083Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6703089Z 2025-12-04T12:12:57.6703298Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6704265Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6704271Z 2025-12-04T12:12:57.6704533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6704750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6704875Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6704989Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6705345Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6705605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6705703Z graph_break [] 2025-12-04T12:12:57.6705929Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6708579Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6708699Z return x.grad, w.grad 2025-12-04T12:12:57.6709419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6709520Z warnings.warn( 2025-12-04T12:12:57.6712169Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6712278Z return x.grad, w.grad 2025-12-04T12:12:57.6712506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6712619Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6712749Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6712974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6713308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6713416Z graph_break [] 2025-12-04T12:12:57.6713634Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6716300Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6716487Z return x.grad, w.grad 2025-12-04T12:12:57.6717209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6717320Z warnings.warn( 2025-12-04T12:12:57.6719997Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6720122Z return x.grad, w.grad 2025-12-04T12:12:57.6720268Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6720851Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6720973Z Traceback (most recent call last): 2025-12-04T12:12:57.6721434Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6721646Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6721853Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6721858Z 2025-12-04T12:12:57.6722072Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6723066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6723075Z 2025-12-04T12:12:57.6723338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6723564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6723673Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6723784Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6724130Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6724345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6724458Z graph_break [] 2025-12-04T12:12:57.6724669Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6727306Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6727427Z return x.grad, w.grad 2025-12-04T12:12:57.6728140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6728252Z warnings.warn( 2025-12-04T12:12:57.6730929Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6731079Z return x.grad, w.grad 2025-12-04T12:12:57.6731287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6731394Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6731520Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6731765Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6732111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6732207Z graph_break [] 2025-12-04T12:12:57.6732416Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6735052Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6735188Z return x.grad, w.grad 2025-12-04T12:12:57.6735910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6736011Z warnings.warn( 2025-12-04T12:12:57.6738654Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6738763Z return x.grad, w.grad 2025-12-04T12:12:57.6738975Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6739097Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6739209Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6739428Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6739771Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6739869Z graph_break [] 2025-12-04T12:12:57.6740091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6740801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6740901Z warnings.warn( 2025-12-04T12:12:57.6743552Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6743696Z return x.grad, w.grad 2025-12-04T12:12:57.6744529Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml - 2025-12-04T12:12:57.6744704Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6745792Z FAILED [0.1644s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6745799Z 2025-12-04T12:12:57.6746013Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6746932Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6746951Z 2025-12-04T12:12:57.6747208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6747418Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6747625Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.6747722Z Got exit code 1 2025-12-04T12:12:57.6747826Z Retrying single test... 2025-12-04T12:12:57.6748465Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml 2025-12-04T12:12:57.6748624Z ============================= test session starts ============================== 2025-12-04T12:12:57.6748981Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6749089Z cachedir: .pytest_cache 2025-12-04T12:12:57.6749593Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6749727Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6749837Z configfile: pytest.ini 2025-12-04T12:12:57.6750538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6750775Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6751779Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6751906Z Running 1 items in this shard 2025-12-04T12:12:57.6751912Z 2025-12-04T12:12:57.6752796Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5480s] [100%] 2025-12-04T12:12:57.6753689Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1656s] [100%] 2025-12-04T12:12:57.6754490Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1648s] [100%] 2025-12-04T12:12:57.6754496Z 2025-12-04T12:12:57.6754630Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6755184Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6755304Z Traceback (most recent call last): 2025-12-04T12:12:57.6755823Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6756018Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6756224Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6756259Z 2025-12-04T12:12:57.6756481Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6757396Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6757401Z 2025-12-04T12:12:57.6757677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6757919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6758029Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6758155Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6758488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6758703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6758810Z graph_break [] 2025-12-04T12:12:57.6759067Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6761735Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6761841Z return x.grad, w.grad 2025-12-04T12:12:57.6762638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6762740Z warnings.warn( 2025-12-04T12:12:57.6765375Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6765493Z return x.grad, w.grad 2025-12-04T12:12:57.6766033Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6766167Z Traceback (most recent call last): 2025-12-04T12:12:57.6766627Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6766824Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6767045Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6767050Z 2025-12-04T12:12:57.6767259Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6768192Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6768197Z 2025-12-04T12:12:57.6768458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6768711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6768832Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6768945Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6769315Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6769530Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6769624Z graph_break [] 2025-12-04T12:12:57.6769845Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6772515Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6772635Z return x.grad, w.grad 2025-12-04T12:12:57.6773350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6773478Z warnings.warn( 2025-12-04T12:12:57.6776127Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6776234Z return x.grad, w.grad 2025-12-04T12:12:57.6776457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6776565Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6776694Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6776910Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6777240Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6777351Z graph_break [] 2025-12-04T12:12:57.6777559Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6780209Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6780318Z return x.grad, w.grad 2025-12-04T12:12:57.6781029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6781140Z warnings.warn( 2025-12-04T12:12:57.6783765Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6783923Z return x.grad, w.grad 2025-12-04T12:12:57.6784099Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6784654Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.6784772Z Traceback (most recent call last): 2025-12-04T12:12:57.6785228Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6785465Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6785673Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6785681Z 2025-12-04T12:12:57.6785890Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6786822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6786862Z 2025-12-04T12:12:57.6787124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6787353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6787461Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6787574Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6787920Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6788132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6788241Z graph_break [] 2025-12-04T12:12:57.6788451Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6791093Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6791211Z return x.grad, w.grad 2025-12-04T12:12:57.6791930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6792039Z warnings.warn( 2025-12-04T12:12:57.6794682Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6794799Z return x.grad, w.grad 2025-12-04T12:12:57.6795009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6795118Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6795245Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6795465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6795841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6795935Z graph_break [] 2025-12-04T12:12:57.6796146Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6798843Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6798948Z return x.grad, w.grad 2025-12-04T12:12:57.6799676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6799778Z warnings.warn( 2025-12-04T12:12:57.6802657Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6802838Z return x.grad, w.grad 2025-12-04T12:12:57.6803055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6803182Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6803300Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6803534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6803866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6803967Z graph_break [] 2025-12-04T12:12:57.6804195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6804915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6805015Z warnings.warn( 2025-12-04T12:12:57.6807670Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.6807776Z return x.grad, w.grad 2025-12-04T12:12:57.6808600Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml - 2025-12-04T12:12:57.6808769Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6809831Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6809837Z 2025-12-04T12:12:57.6810049Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6811020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6811040Z 2025-12-04T12:12:57.6811340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6811521Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6811731Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.6811830Z Got exit code 1 2025-12-04T12:12:57.6812719Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.6813137Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6813762Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml 2025-12-04T12:12:57.6813937Z ============================= test session starts ============================== 2025-12-04T12:12:57.6814288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6814429Z cachedir: .pytest_cache 2025-12-04T12:12:57.6814952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6815075Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6815196Z configfile: pytest.ini 2025-12-04T12:12:57.6815777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6815998Z collecting ... collected 380 items / 58 deselected / 322 selected 2025-12-04T12:12:57.6816152Z stepcurrent: skipping 58 already run items. 2025-12-04T12:12:57.6816265Z Running 117 items in this shard 2025-12-04T12:12:57.6816270Z 2025-12-04T12:12:57.6817163Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5306s] [ 0%] 2025-12-04T12:12:57.6818067Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [ 0%] 2025-12-04T12:12:57.6818879Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [ 0%] 2025-12-04T12:12:57.6818884Z 2025-12-04T12:12:57.6819033Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6819579Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6819714Z Traceback (most recent call last): 2025-12-04T12:12:57.6820176Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6820376Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6820597Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6820602Z 2025-12-04T12:12:57.6820807Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6821733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6821738Z 2025-12-04T12:12:57.6821998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6822245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6822366Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6822477Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6822807Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6823064Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6823161Z graph_break [] 2025-12-04T12:12:57.6823383Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6824104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6824238Z warnings.warn( 2025-12-04T12:12:57.6824800Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6824924Z Traceback (most recent call last): 2025-12-04T12:12:57.6825397Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6825590Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6825798Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6825833Z 2025-12-04T12:12:57.6826053Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6826972Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6826977Z 2025-12-04T12:12:57.6827254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6827466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6827577Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6827701Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6828031Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6828244Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6828352Z graph_break [] 2025-12-04T12:12:57.6828567Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6829295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6829396Z warnings.warn( 2025-12-04T12:12:57.6829605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6829728Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6829839Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6830050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6830392Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6830488Z graph_break [] 2025-12-04T12:12:57.6830710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6831427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6831528Z warnings.warn( 2025-12-04T12:12:57.6831682Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6832230Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6832353Z Traceback (most recent call last): 2025-12-04T12:12:57.6832823Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6833053Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6833271Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6833276Z 2025-12-04T12:12:57.6833485Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6834444Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6834451Z 2025-12-04T12:12:57.6834727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6834938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6835091Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6835206Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6835538Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6835768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6835863Z graph_break [] 2025-12-04T12:12:57.6836074Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6836805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6836939Z warnings.warn( 2025-12-04T12:12:57.6837163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6837277Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6837389Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6837616Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6837948Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6838043Z graph_break [] 2025-12-04T12:12:57.6838263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6838973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6839082Z warnings.warn( 2025-12-04T12:12:57.6839294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6839401Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6839523Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6839735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6840061Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6840169Z graph_break [] 2025-12-04T12:12:57.6840378Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6841099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6841198Z warnings.warn( 2025-12-04T12:12:57.6842002Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml - 2025-12-04T12:12:57.6842252Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6843305Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6843311Z 2025-12-04T12:12:57.6843539Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6844463Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6844508Z 2025-12-04T12:12:57.6844769Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6844957Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6845250Z ================== 1 failed, 58 deselected, 2 rerun in 4.90s =================== 2025-12-04T12:12:57.6845362Z Got exit code 1 2025-12-04T12:12:57.6845465Z Retrying single test... 2025-12-04T12:12:57.6846092Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml 2025-12-04T12:12:57.6846265Z ============================= test session starts ============================== 2025-12-04T12:12:57.6846636Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6846757Z cachedir: .pytest_cache 2025-12-04T12:12:57.6847264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6847384Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6847507Z configfile: pytest.ini 2025-12-04T12:12:57.6848083Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6848337Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6849352Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6849465Z Running 1 items in this shard 2025-12-04T12:12:57.6849471Z 2025-12-04T12:12:57.6850366Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5775s] [100%] 2025-12-04T12:12:57.6851247Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1649s] [100%] 2025-12-04T12:12:57.6852062Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1563s] [100%] 2025-12-04T12:12:57.6852067Z 2025-12-04T12:12:57.6852204Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6852746Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6852876Z Traceback (most recent call last): 2025-12-04T12:12:57.6853334Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6853544Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6853750Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6853755Z 2025-12-04T12:12:57.6853964Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6854896Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6854900Z 2025-12-04T12:12:57.6855158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6855384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6855495Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6855608Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6855984Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6856198Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6856293Z graph_break [] 2025-12-04T12:12:57.6856519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6857286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6857404Z warnings.warn( 2025-12-04T12:12:57.6857953Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6858071Z Traceback (most recent call last): 2025-12-04T12:12:57.6858570Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6858764Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6858971Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6858989Z 2025-12-04T12:12:57.6859199Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6860114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6860147Z 2025-12-04T12:12:57.6860419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6860629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6860737Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6860861Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6861194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6861417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6861513Z graph_break [] 2025-12-04T12:12:57.6861721Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6862452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6862553Z warnings.warn( 2025-12-04T12:12:57.6862763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6862881Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6862994Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6863216Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6863545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6863640Z graph_break [] 2025-12-04T12:12:57.6863872Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6864584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6864684Z warnings.warn( 2025-12-04T12:12:57.6864839Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6865391Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6865525Z Traceback (most recent call last): 2025-12-04T12:12:57.6865985Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6866180Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6866407Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6866412Z 2025-12-04T12:12:57.6866622Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6867603Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6867609Z 2025-12-04T12:12:57.6867898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6868114Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6868239Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6868352Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6868699Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6868945Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6869043Z graph_break [] 2025-12-04T12:12:57.6869264Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6869985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6870084Z warnings.warn( 2025-12-04T12:12:57.6870308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6870452Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6870576Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6870790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6871120Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6871231Z graph_break [] 2025-12-04T12:12:57.6871442Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6872153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6872262Z warnings.warn( 2025-12-04T12:12:57.6872474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6872596Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6872705Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6872921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6873269Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6873363Z graph_break [] 2025-12-04T12:12:57.6873570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6874295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6874394Z warnings.warn( 2025-12-04T12:12:57.6875212Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml - 2025-12-04T12:12:57.6875379Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6876424Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6876433Z 2025-12-04T12:12:57.6876654Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6877571Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6877576Z 2025-12-04T12:12:57.6877845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6878055Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6878247Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.6878356Z Got exit code 1 2025-12-04T12:12:57.6878458Z Retrying single test... 2025-12-04T12:12:57.6879126Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml 2025-12-04T12:12:57.6879290Z ============================= test session starts ============================== 2025-12-04T12:12:57.6879629Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6879748Z cachedir: .pytest_cache 2025-12-04T12:12:57.6880282Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6880404Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6880522Z configfile: pytest.ini 2025-12-04T12:12:57.6881098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6881330Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6882401Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6882558Z Running 1 items in this shard 2025-12-04T12:12:57.6882563Z 2025-12-04T12:12:57.6883466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5511s] [100%] 2025-12-04T12:12:57.6884354Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%] 2025-12-04T12:12:57.6885174Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1580s] [100%] 2025-12-04T12:12:57.6885182Z 2025-12-04T12:12:57.6885319Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6885876Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6885994Z Traceback (most recent call last): 2025-12-04T12:12:57.6886453Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6886664Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6886874Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6886885Z 2025-12-04T12:12:57.6887104Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6888025Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6888034Z 2025-12-04T12:12:57.6888295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6888519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6888630Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6888743Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6889089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6889303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6889411Z graph_break [] 2025-12-04T12:12:57.6889655Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6890372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6890481Z warnings.warn( 2025-12-04T12:12:57.6891056Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6891191Z Traceback (most recent call last): 2025-12-04T12:12:57.6891651Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6891845Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6892094Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6892099Z 2025-12-04T12:12:57.6892344Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6893261Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6893278Z 2025-12-04T12:12:57.6893541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6893785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6893907Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6894020Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6894352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6894580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6894677Z graph_break [] 2025-12-04T12:12:57.6894899Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6895615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6895717Z warnings.warn( 2025-12-04T12:12:57.6895941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6896051Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6896169Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6896398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6896731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6896839Z graph_break [] 2025-12-04T12:12:57.6897049Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6897767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6897880Z warnings.warn( 2025-12-04T12:12:57.6898018Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6898563Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.6898693Z Traceback (most recent call last): 2025-12-04T12:12:57.6899156Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6899363Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6899567Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6899572Z 2025-12-04T12:12:57.6899781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6900718Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6900782Z 2025-12-04T12:12:57.6901204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6901429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6901538Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6901719Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6902065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6902277Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6902370Z graph_break [] 2025-12-04T12:12:57.6902599Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6903352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6903464Z warnings.warn( 2025-12-04T12:12:57.6903678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6903791Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6903916Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6904127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6904458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6904614Z graph_break [] 2025-12-04T12:12:57.6904826Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6905548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6905645Z warnings.warn( 2025-12-04T12:12:57.6905852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6905971Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6906084Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6906298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6906637Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6906731Z graph_break [] 2025-12-04T12:12:57.6906953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6907661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6907757Z warnings.warn( 2025-12-04T12:12:57.6908561Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml - 2025-12-04T12:12:57.6908731Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6909795Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6909802Z 2025-12-04T12:12:57.6910012Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6910938Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6910943Z 2025-12-04T12:12:57.6911214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6911390Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6911599Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.6911697Z Got exit code 1 2025-12-04T12:12:57.6912582Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.6912999Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.6913654Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml 2025-12-04T12:12:57.6913828Z ============================= test session starts ============================== 2025-12-04T12:12:57.6914170Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6914278Z cachedir: .pytest_cache 2025-12-04T12:12:57.6914832Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6914957Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6915067Z configfile: pytest.ini 2025-12-04T12:12:57.6915655Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6915876Z collecting ... collected 380 items / 59 deselected / 321 selected 2025-12-04T12:12:57.6916062Z stepcurrent: skipping 59 already run items. 2025-12-04T12:12:57.6916175Z Running 116 items in this shard 2025-12-04T12:12:57.6916181Z 2025-12-04T12:12:57.6917184Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.6918082Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5353s] [ 1%] 2025-12-04T12:12:57.6918966Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1576s] [ 1%] 2025-12-04T12:12:57.6919786Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1566s] [ 1%] 2025-12-04T12:12:57.6919794Z 2025-12-04T12:12:57.6919932Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6920485Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6920605Z Traceback (most recent call last): 2025-12-04T12:12:57.6921067Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6921272Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6921477Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6921482Z 2025-12-04T12:12:57.6921701Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6922691Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6922701Z 2025-12-04T12:12:57.6922961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6923187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6923298Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6923425Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6923759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6923974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6924134Z graph_break [] 2025-12-04T12:12:57.6924344Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6925105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6925221Z warnings.warn( 2025-12-04T12:12:57.6925769Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6925906Z Traceback (most recent call last): 2025-12-04T12:12:57.6926368Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6926594Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6926817Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6926825Z 2025-12-04T12:12:57.6927038Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6927961Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6928015Z 2025-12-04T12:12:57.6928279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6928491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6928614Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6928728Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6929060Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6929291Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6929390Z graph_break [] 2025-12-04T12:12:57.6929614Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6930339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6930441Z warnings.warn( 2025-12-04T12:12:57.6930672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6930784Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6930896Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6931128Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6931460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6931570Z graph_break [] 2025-12-04T12:12:57.6931784Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6932496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6932615Z warnings.warn( 2025-12-04T12:12:57.6932760Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6933312Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6933446Z Traceback (most recent call last): 2025-12-04T12:12:57.6933905Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6934110Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6934315Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6934320Z 2025-12-04T12:12:57.6934533Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6935471Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6935514Z 2025-12-04T12:12:57.6935772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6936026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6936136Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6936247Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6936587Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6936801Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6936896Z graph_break [] 2025-12-04T12:12:57.6937150Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6937867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6937981Z warnings.warn( 2025-12-04T12:12:57.6938187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6938294Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6938416Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6938680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6939007Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6939115Z graph_break [] 2025-12-04T12:12:57.6939325Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6940046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6940143Z warnings.warn( 2025-12-04T12:12:57.6940349Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6940468Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6940579Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6940791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6941132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6941230Z graph_break [] 2025-12-04T12:12:57.6941453Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6942159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6942256Z warnings.warn( 2025-12-04T12:12:57.6943072Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml - 2025-12-04T12:12:57.6943243Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6944309Z FAILED [0.1566s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6944319Z 2025-12-04T12:12:57.6944530Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6945453Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6945471Z 2025-12-04T12:12:57.6945737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6945913Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6946136Z ============= 1 failed, 1 skipped, 59 deselected, 2 rerun in 4.91s ============= 2025-12-04T12:12:57.6946267Z Got exit code 1 2025-12-04T12:12:57.6946371Z Retrying single test... 2025-12-04T12:12:57.6947011Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml 2025-12-04T12:12:57.6947199Z ============================= test session starts ============================== 2025-12-04T12:12:57.6947552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6947659Z cachedir: .pytest_cache 2025-12-04T12:12:57.6948169Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6948304Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6948439Z configfile: pytest.ini 2025-12-04T12:12:57.6949013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6949248Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6950246Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6950403Z Running 1 items in this shard 2025-12-04T12:12:57.6950408Z 2025-12-04T12:12:57.6951290Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5303s] [100%] 2025-12-04T12:12:57.6952183Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [100%] 2025-12-04T12:12:57.6952981Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1542s] [100%] 2025-12-04T12:12:57.6952989Z 2025-12-04T12:12:57.6953124Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6953679Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6953799Z Traceback (most recent call last): 2025-12-04T12:12:57.6954272Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6954465Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6954676Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6954681Z 2025-12-04T12:12:57.6954902Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6955826Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6955831Z 2025-12-04T12:12:57.6956105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6956322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6956434Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6956557Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6956889Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6957102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6957214Z graph_break [] 2025-12-04T12:12:57.6957425Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6958154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6958289Z warnings.warn( 2025-12-04T12:12:57.6958834Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6958998Z Traceback (most recent call last): 2025-12-04T12:12:57.6959457Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6959651Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6959870Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6959875Z 2025-12-04T12:12:57.6960084Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6961048Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6961055Z 2025-12-04T12:12:57.6961318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6961540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6961651Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6961790Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6962203Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6962420Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6962517Z graph_break [] 2025-12-04T12:12:57.6962743Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6963461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6963576Z warnings.warn( 2025-12-04T12:12:57.6963784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6963894Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6964022Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6964232Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6964563Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6964672Z graph_break [] 2025-12-04T12:12:57.6964879Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6965589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6965699Z warnings.warn( 2025-12-04T12:12:57.6965839Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6966405Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6966521Z Traceback (most recent call last): 2025-12-04T12:12:57.6966979Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6967186Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6967395Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6967400Z 2025-12-04T12:12:57.6967618Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6968536Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6968541Z 2025-12-04T12:12:57.6968799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6969065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6969173Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6969297Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6969627Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6969872Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6969982Z graph_break [] 2025-12-04T12:12:57.6970190Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6970908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6971046Z warnings.warn( 2025-12-04T12:12:57.6971256Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6971375Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6971488Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6971703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6972043Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6972138Z graph_break [] 2025-12-04T12:12:57.6972382Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6973102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6973201Z warnings.warn( 2025-12-04T12:12:57.6973422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6973533Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6973644Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6973871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6974199Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6974294Z graph_break [] 2025-12-04T12:12:57.6974514Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6975223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6975335Z warnings.warn( 2025-12-04T12:12:57.6976134Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml - 2025-12-04T12:12:57.6976301Z =========================== short test summary info ============================ 2025-12-04T12:12:57.6977367Z FAILED [0.1542s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6977375Z 2025-12-04T12:12:57.6977584Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6978518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6978525Z 2025-12-04T12:12:57.6978784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6978958Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.6979164Z ================== 1 failed, 174 deselected, 2 rerun in 4.89s ================== 2025-12-04T12:12:57.6979261Z Got exit code 1 2025-12-04T12:12:57.6979378Z Retrying single test... 2025-12-04T12:12:57.6980004Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml 2025-12-04T12:12:57.6980196Z ============================= test session starts ============================== 2025-12-04T12:12:57.6980552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.6980660Z cachedir: .pytest_cache 2025-12-04T12:12:57.6981196Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.6981339Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.6981446Z configfile: pytest.ini 2025-12-04T12:12:57.6982039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.6982306Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.6983306Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6983436Z Running 1 items in this shard 2025-12-04T12:12:57.6983440Z 2025-12-04T12:12:57.6984333Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5375s] [100%] 2025-12-04T12:12:57.6985266Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1595s] [100%] 2025-12-04T12:12:57.6986072Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1563s] [100%] 2025-12-04T12:12:57.6986078Z 2025-12-04T12:12:57.6986232Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.6986777Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6986897Z Traceback (most recent call last): 2025-12-04T12:12:57.6987377Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6987573Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6987797Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6987802Z 2025-12-04T12:12:57.6988011Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6988936Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6988941Z 2025-12-04T12:12:57.6989220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6989437Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6989564Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6989678Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6990012Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6990245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6990343Z graph_break [] 2025-12-04T12:12:57.6990553Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6991295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6991395Z warnings.warn( 2025-12-04T12:12:57.6991955Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6992107Z Traceback (most recent call last): 2025-12-04T12:12:57.6992561Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.6992797Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.6993005Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.6993010Z 2025-12-04T12:12:57.6993222Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.6994160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.6994202Z 2025-12-04T12:12:57.6994465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.6994695Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6994809Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6994922Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6995269Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6995487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6995630Z graph_break [] 2025-12-04T12:12:57.6995841Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6996558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6996670Z warnings.warn( 2025-12-04T12:12:57.6996880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.6996986Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.6997112Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.6997324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.6997665Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.6997757Z graph_break [] 2025-12-04T12:12:57.6997965Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.6998693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.6998789Z warnings.warn( 2025-12-04T12:12:57.6998929Z =================================== FAILURES =================================== 2025-12-04T12:12:57.6999483Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.6999599Z Traceback (most recent call last): 2025-12-04T12:12:57.7000069Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7000264Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7000465Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7000470Z 2025-12-04T12:12:57.7000691Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7001798Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7001803Z 2025-12-04T12:12:57.7002079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7002388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7002499Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7002624Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7003038Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7003266Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7003361Z graph_break [] 2025-12-04T12:12:57.7003571Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7004338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7004438Z warnings.warn( 2025-12-04T12:12:57.7004646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7004770Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7004880Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7005134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7005478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7005575Z graph_break [] 2025-12-04T12:12:57.7005796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7006510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7006654Z warnings.warn( 2025-12-04T12:12:57.7006874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7006980Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7007090Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7007316Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7007645Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7007753Z graph_break [] 2025-12-04T12:12:57.7007963Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7008677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7008787Z warnings.warn( 2025-12-04T12:12:57.7009590Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml - 2025-12-04T12:12:57.7009770Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7010817Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7010825Z 2025-12-04T12:12:57.7011036Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7011970Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7011977Z 2025-12-04T12:12:57.7012238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7012430Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7012624Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.7012720Z Got exit code 1 2025-12-04T12:12:57.7013571Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7013972Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7014608Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml 2025-12-04T12:12:57.7014806Z ============================= test session starts ============================== 2025-12-04T12:12:57.7015147Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7015296Z cachedir: .pytest_cache 2025-12-04T12:12:57.7015806Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7015939Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7016045Z configfile: pytest.ini 2025-12-04T12:12:57.7016620Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7016884Z collecting ... collected 380 items / 61 deselected / 319 selected 2025-12-04T12:12:57.7017024Z stepcurrent: skipping 61 already run items. 2025-12-04T12:12:57.7017137Z Running 114 items in this shard 2025-12-04T12:12:57.7017143Z 2025-12-04T12:12:57.7018047Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5433s] [ 0%] 2025-12-04T12:12:57.7018961Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1590s] [ 0%] 2025-12-04T12:12:57.7019777Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1561s] [ 0%] 2025-12-04T12:12:57.7019783Z 2025-12-04T12:12:57.7019923Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7020477Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7020600Z Traceback (most recent call last): 2025-12-04T12:12:57.7021060Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7021264Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7021475Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7021479Z 2025-12-04T12:12:57.7021685Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7022614Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7022619Z 2025-12-04T12:12:57.7022880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7023104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7023217Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7023332Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7023675Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7023890Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7024000Z graph_break [] 2025-12-04T12:12:57.7024210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7024930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7025040Z warnings.warn( 2025-12-04T12:12:57.7025584Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7025702Z Traceback (most recent call last): 2025-12-04T12:12:57.7026299Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7026490Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7026707Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7026713Z 2025-12-04T12:12:57.7026972Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7027894Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7027912Z 2025-12-04T12:12:57.7028171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7028415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7028537Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7028649Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7028978Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7029203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7029297Z graph_break [] 2025-12-04T12:12:57.7029508Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7030267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7030365Z warnings.warn( 2025-12-04T12:12:57.7030585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7030692Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7030808Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7031035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7031365Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7031463Z graph_break [] 2025-12-04T12:12:57.7031682Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7032399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7032513Z warnings.warn( 2025-12-04T12:12:57.7032653Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7033204Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7033335Z Traceback (most recent call last): 2025-12-04T12:12:57.7033795Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7034003Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7034213Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7034217Z 2025-12-04T12:12:57.7034423Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7035357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7035365Z 2025-12-04T12:12:57.7035624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7035851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7035959Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7036072Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7036413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7036626Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7036760Z graph_break [] 2025-12-04T12:12:57.7048253Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7049194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7049313Z warnings.warn( 2025-12-04T12:12:57.7049558Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7049680Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7049815Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7050041Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7050426Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7050546Z graph_break [] 2025-12-04T12:12:57.7050773Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7051517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7051637Z warnings.warn( 2025-12-04T12:12:57.7051860Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7052027Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7052146Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7052370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7052722Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7052824Z graph_break [] 2025-12-04T12:12:57.7053042Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7053799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7053906Z warnings.warn( 2025-12-04T12:12:57.7054724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml - 2025-12-04T12:12:57.7054900Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7055958Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7055981Z 2025-12-04T12:12:57.7056196Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7057128Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7057136Z 2025-12-04T12:12:57.7057409Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7057588Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7057791Z ================== 1 failed, 61 deselected, 2 rerun in 4.91s =================== 2025-12-04T12:12:57.7057911Z Got exit code 1 2025-12-04T12:12:57.7058021Z Retrying single test... 2025-12-04T12:12:57.7058671Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml 2025-12-04T12:12:57.7058839Z ============================= test session starts ============================== 2025-12-04T12:12:57.7059189Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7059313Z cachedir: .pytest_cache 2025-12-04T12:12:57.7059826Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7060006Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7060129Z configfile: pytest.ini 2025-12-04T12:12:57.7060743Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7060992Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7062008Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7062127Z Running 1 items in this shard 2025-12-04T12:12:57.7062132Z 2025-12-04T12:12:57.7063071Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5406s] [100%] 2025-12-04T12:12:57.7063965Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%] 2025-12-04T12:12:57.7064793Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1560s] [100%] 2025-12-04T12:12:57.7064833Z 2025-12-04T12:12:57.7064978Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7065542Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7065672Z Traceback (most recent call last): 2025-12-04T12:12:57.7066141Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7066360Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7066574Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7066580Z 2025-12-04T12:12:57.7066794Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7067731Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7067739Z 2025-12-04T12:12:57.7068036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7068278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7068408Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7068622Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7073350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7073600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7073712Z graph_break [] 2025-12-04T12:12:57.7073928Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7074645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7074749Z warnings.warn( 2025-12-04T12:12:57.7075287Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7075402Z Traceback (most recent call last): 2025-12-04T12:12:57.7075864Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7076052Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7076298Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7076305Z 2025-12-04T12:12:57.7076509Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7077457Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7077473Z 2025-12-04T12:12:57.7077737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7077948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7078068Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7078175Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7078535Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7078757Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7078851Z graph_break [] 2025-12-04T12:12:57.7079058Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7079779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7080495Z warnings.warn( 2025-12-04T12:12:57.7080710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7080813Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7080919Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7081138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7081460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7081555Z graph_break [] 2025-12-04T12:12:57.7081767Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7082570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7082677Z warnings.warn( 2025-12-04T12:12:57.7082814Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7083361Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7083482Z Traceback (most recent call last): 2025-12-04T12:12:57.7083935Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7084124Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7084333Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7084339Z 2025-12-04T12:12:57.7084540Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7106110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7106127Z 2025-12-04T12:12:57.7106415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7106634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7106748Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7106854Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7107188Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7107397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7107486Z graph_break [] 2025-12-04T12:12:57.7107705Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7108421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7108625Z warnings.warn( 2025-12-04T12:12:57.7108836Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7108939Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7109108Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7109320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7109643Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7109744Z graph_break [] 2025-12-04T12:12:57.7109948Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7110705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7110812Z warnings.warn( 2025-12-04T12:12:57.7111016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7111130Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7111244Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7111457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7111872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7111967Z graph_break [] 2025-12-04T12:12:57.7112174Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7112897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7112996Z warnings.warn( 2025-12-04T12:12:57.7113816Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml - 2025-12-04T12:12:57.7113987Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7115056Z FAILED [0.1560s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7115078Z 2025-12-04T12:12:57.7115294Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7116220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7116225Z 2025-12-04T12:12:57.7116499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7116675Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7116886Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.7116983Z Got exit code 1 2025-12-04T12:12:57.7117089Z Retrying single test... 2025-12-04T12:12:57.7117731Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml 2025-12-04T12:12:57.7117891Z ============================= test session starts ============================== 2025-12-04T12:12:57.7118236Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7118355Z cachedir: .pytest_cache 2025-12-04T12:12:57.7118868Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7119004Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7119108Z configfile: pytest.ini 2025-12-04T12:12:57.7119683Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7120010Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7121127Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7121246Z Running 1 items in this shard 2025-12-04T12:12:57.7121265Z 2025-12-04T12:12:57.7122219Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [100%] 2025-12-04T12:12:57.7123143Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1589s] [100%] 2025-12-04T12:12:57.7123970Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%] 2025-12-04T12:12:57.7123976Z 2025-12-04T12:12:57.7124115Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7124711Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7124830Z Traceback (most recent call last): 2025-12-04T12:12:57.7125293Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7125499Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7125705Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7125710Z 2025-12-04T12:12:57.7125932Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7126854Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7126859Z 2025-12-04T12:12:57.7127121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7127351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7127460Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7127591Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7127923Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7128138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7128246Z graph_break [] 2025-12-04T12:12:57.7128460Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7129182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7129292Z warnings.warn( 2025-12-04T12:12:57.7129840Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7129973Z Traceback (most recent call last): 2025-12-04T12:12:57.7130432Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7130625Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7130842Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7130848Z 2025-12-04T12:12:57.7131056Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7131984Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7132022Z 2025-12-04T12:12:57.7132281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7132521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7132646Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7132757Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7133089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7133315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7133411Z graph_break [] 2025-12-04T12:12:57.7133662Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7134383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7134486Z warnings.warn( 2025-12-04T12:12:57.7134708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7134814Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7134926Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7135188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7135519Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7135627Z graph_break [] 2025-12-04T12:12:57.7135835Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7136551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7136664Z warnings.warn( 2025-12-04T12:12:57.7136804Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7137365Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7137485Z Traceback (most recent call last): 2025-12-04T12:12:57.7137944Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7138152Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7138356Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7138361Z 2025-12-04T12:12:57.7138571Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7139513Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7139518Z 2025-12-04T12:12:57.7139780Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7140003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7140112Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7140224Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7140566Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7140781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7140889Z graph_break [] 2025-12-04T12:12:57.7141098Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7141813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7141925Z warnings.warn( 2025-12-04T12:12:57.7142134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7142273Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7142396Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7142612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7142983Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7143081Z graph_break [] 2025-12-04T12:12:57.7143288Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7144012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7144109Z warnings.warn( 2025-12-04T12:12:57.7144346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7144468Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7144578Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7144806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7145134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7145230Z graph_break [] 2025-12-04T12:12:57.7145451Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7146193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7146290Z warnings.warn( 2025-12-04T12:12:57.7147102Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml - 2025-12-04T12:12:57.7147271Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7148333Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7148341Z 2025-12-04T12:12:57.7148553Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7149475Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7149495Z 2025-12-04T12:12:57.7149756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7149932Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7150141Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.7150238Z Got exit code 1 2025-12-04T12:12:57.7151069Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7151485Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7152109Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml 2025-12-04T12:12:57.7152287Z ============================= test session starts ============================== 2025-12-04T12:12:57.7152627Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7152734Z cachedir: .pytest_cache 2025-12-04T12:12:57.7153257Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7153381Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7153504Z configfile: pytest.ini 2025-12-04T12:12:57.7154110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7154330Z collecting ... collected 380 items / 62 deselected / 318 selected 2025-12-04T12:12:57.7154484Z stepcurrent: skipping 62 already run items. 2025-12-04T12:12:57.7154628Z Running 113 items in this shard 2025-12-04T12:12:57.7154633Z 2025-12-04T12:12:57.7155648Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.7156583Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5302s] [ 1%] 2025-12-04T12:12:57.7157465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1578s] [ 1%] 2025-12-04T12:12:57.7158287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1549s] [ 1%] 2025-12-04T12:12:57.7158323Z 2025-12-04T12:12:57.7158464Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7159024Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7159144Z Traceback (most recent call last): 2025-12-04T12:12:57.7159607Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7159812Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7160019Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7160026Z 2025-12-04T12:12:57.7160250Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7161176Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7161183Z 2025-12-04T12:12:57.7161449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7161676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7161788Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7161918Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7162377Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7162591Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7162705Z graph_break [] 2025-12-04T12:12:57.7162921Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7163645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7163765Z warnings.warn( 2025-12-04T12:12:57.7164308Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7164439Z Traceback (most recent call last): 2025-12-04T12:12:57.7164899Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7165098Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7165323Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7165328Z 2025-12-04T12:12:57.7165539Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7166528Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7166534Z 2025-12-04T12:12:57.7166825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7167041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7167162Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7167276Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7167631Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7167881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7167983Z graph_break [] 2025-12-04T12:12:57.7168209Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7168928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7169028Z warnings.warn( 2025-12-04T12:12:57.7169250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7169396Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7169526Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7169741Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7170073Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7170185Z graph_break [] 2025-12-04T12:12:57.7170393Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7171104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7171216Z warnings.warn( 2025-12-04T12:12:57.7171356Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7171913Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7172035Z Traceback (most recent call last): 2025-12-04T12:12:57.7172498Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7172701Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7172908Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7172913Z 2025-12-04T12:12:57.7173119Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7174059Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7174066Z 2025-12-04T12:12:57.7174324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7174549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7174660Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7174773Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7175116Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7175329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7175438Z graph_break [] 2025-12-04T12:12:57.7175647Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7176367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7176520Z warnings.warn( 2025-12-04T12:12:57.7176729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7176839Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7176965Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7177177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7177552Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7177649Z graph_break [] 2025-12-04T12:12:57.7177858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7178585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7178716Z warnings.warn( 2025-12-04T12:12:57.7178927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7179050Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7179162Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7179392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7179722Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7179878Z graph_break [] 2025-12-04T12:12:57.7180101Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7180814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7180912Z warnings.warn( 2025-12-04T12:12:57.7181725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml - 2025-12-04T12:12:57.7181892Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7182966Z FAILED [0.1549s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7182973Z 2025-12-04T12:12:57.7183190Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7184132Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7184137Z 2025-12-04T12:12:57.7184397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7184578Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7184803Z ============= 1 failed, 1 skipped, 62 deselected, 2 rerun in 4.90s ============= 2025-12-04T12:12:57.7184902Z Got exit code 1 2025-12-04T12:12:57.7185009Z Retrying single test... 2025-12-04T12:12:57.7185648Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml 2025-12-04T12:12:57.7185809Z ============================= test session starts ============================== 2025-12-04T12:12:57.7186172Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7186278Z cachedir: .pytest_cache 2025-12-04T12:12:57.7186788Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7186924Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7187029Z configfile: pytest.ini 2025-12-04T12:12:57.7187607Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7187886Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7188887Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7189049Z Running 1 items in this shard 2025-12-04T12:12:57.7189054Z 2025-12-04T12:12:57.7189948Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5279s] [100%] 2025-12-04T12:12:57.7190888Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1583s] [100%] 2025-12-04T12:12:57.7191691Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1561s] [100%] 2025-12-04T12:12:57.7191698Z 2025-12-04T12:12:57.7191835Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7192396Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7192545Z Traceback (most recent call last): 2025-12-04T12:12:57.7193020Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7193213Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7193418Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7193423Z 2025-12-04T12:12:57.7193645Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7194571Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7194578Z 2025-12-04T12:12:57.7194850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7195064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7195177Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7195306Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7195644Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7195873Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7195972Z graph_break [] 2025-12-04T12:12:57.7196187Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7196919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7197020Z warnings.warn( 2025-12-04T12:12:57.7197580Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7197711Z Traceback (most recent call last): 2025-12-04T12:12:57.7198177Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7198385Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7198601Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7198606Z 2025-12-04T12:12:57.7198815Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7199756Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7199805Z 2025-12-04T12:12:57.7200066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7200292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7200405Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7200519Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7201174Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7201401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7201498Z graph_break [] 2025-12-04T12:12:57.7201720Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7202556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7202673Z warnings.warn( 2025-12-04T12:12:57.7202889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7202999Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7203124Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7203341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7203674Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7203830Z graph_break [] 2025-12-04T12:12:57.7204041Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7204769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7204868Z warnings.warn( 2025-12-04T12:12:57.7205012Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7205576Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7205698Z Traceback (most recent call last): 2025-12-04T12:12:57.7206162Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7206374Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7206585Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7206590Z 2025-12-04T12:12:57.7206817Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7207742Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7207750Z 2025-12-04T12:12:57.7208067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7208279Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7208389Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7208516Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7208847Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7209062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7209172Z graph_break [] 2025-12-04T12:12:57.7209381Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7210091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7210203Z warnings.warn( 2025-12-04T12:12:57.7210417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7210545Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7210657Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7210924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7211272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7211369Z graph_break [] 2025-12-04T12:12:57.7211623Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7212335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7212432Z warnings.warn( 2025-12-04T12:12:57.7212658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7212767Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7212911Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7213140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7213470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7213582Z graph_break [] 2025-12-04T12:12:57.7213795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7214508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7214655Z warnings.warn( 2025-12-04T12:12:57.7215459Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml - 2025-12-04T12:12:57.7215643Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7216695Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7216703Z 2025-12-04T12:12:57.7216915Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7217855Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7217863Z 2025-12-04T12:12:57.7218125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7218313Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7218509Z ================== 1 failed, 174 deselected, 2 rerun in 4.89s ================== 2025-12-04T12:12:57.7218607Z Got exit code 1 2025-12-04T12:12:57.7218730Z Retrying single test... 2025-12-04T12:12:57.7219360Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml 2025-12-04T12:12:57.7219539Z ============================= test session starts ============================== 2025-12-04T12:12:57.7219882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7219992Z cachedir: .pytest_cache 2025-12-04T12:12:57.7220517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7220641Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7220748Z configfile: pytest.ini 2025-12-04T12:12:57.7221346Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7221568Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7222588Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7222744Z Running 1 items in this shard 2025-12-04T12:12:57.7222749Z 2025-12-04T12:12:57.7223673Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5425s] [100%] 2025-12-04T12:12:57.7224572Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%] 2025-12-04T12:12:57.7225414Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [100%] 2025-12-04T12:12:57.7225420Z 2025-12-04T12:12:57.7225575Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7226121Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7226253Z Traceback (most recent call last): 2025-12-04T12:12:57.7226715Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7226939Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7227160Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7227165Z 2025-12-04T12:12:57.7227375Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7228311Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7228316Z 2025-12-04T12:12:57.7228577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7228792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7228913Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7229028Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7229357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7229587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7229682Z graph_break [] 2025-12-04T12:12:57.7229906Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7230628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7230730Z warnings.warn( 2025-12-04T12:12:57.7231288Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7231408Z Traceback (most recent call last): 2025-12-04T12:12:57.7231870Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7232078Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7232288Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7232295Z 2025-12-04T12:12:57.7232519Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7233437Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7233443Z 2025-12-04T12:12:57.7233717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7233934Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7234076Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7234200Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7234529Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7234744Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7234854Z graph_break [] 2025-12-04T12:12:57.7235093Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7235817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7235928Z warnings.warn( 2025-12-04T12:12:57.7236138Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7236372Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7236486Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7236701Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7237042Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7237138Z graph_break [] 2025-12-04T12:12:57.7237348Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7238072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7238205Z warnings.warn( 2025-12-04T12:12:57.7238362Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7238906Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7239026Z Traceback (most recent call last): 2025-12-04T12:12:57.7239499Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7239693Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7239912Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7239917Z 2025-12-04T12:12:57.7240126Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7241054Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7241061Z 2025-12-04T12:12:57.7241336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7241547Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7241672Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7241783Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7242193Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7242430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7242528Z graph_break [] 2025-12-04T12:12:57.7242738Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7243482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7243586Z warnings.warn( 2025-12-04T12:12:57.7243813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7243923Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7244036Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7244270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7244599Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7244775Z graph_break [] 2025-12-04T12:12:57.7245001Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7245719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7245832Z warnings.warn( 2025-12-04T12:12:57.7246084Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7246194Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7246326Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7246543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7246873Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7247016Z graph_break [] 2025-12-04T12:12:57.7247229Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7247958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7248059Z warnings.warn( 2025-12-04T12:12:57.7248859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml - 2025-12-04T12:12:57.7249078Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7250129Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7250136Z 2025-12-04T12:12:57.7250366Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7251291Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7251299Z 2025-12-04T12:12:57.7251564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7251758Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7251956Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.7252066Z Got exit code 1 2025-12-04T12:12:57.7252916Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7253321Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7253963Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml 2025-12-04T12:12:57.7254127Z ============================= test session starts ============================== 2025-12-04T12:12:57.7254485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7254594Z cachedir: .pytest_cache 2025-12-04T12:12:57.7255101Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7255236Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7255343Z configfile: pytest.ini 2025-12-04T12:12:57.7255920Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7256156Z collecting ... collected 380 items / 64 deselected / 316 selected 2025-12-04T12:12:57.7256295Z stepcurrent: skipping 64 already run items. 2025-12-04T12:12:57.7256419Z Running 111 items in this shard 2025-12-04T12:12:57.7256474Z 2025-12-04T12:12:57.7257367Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5344s] [ 0%] 2025-12-04T12:12:57.7258287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [ 0%] 2025-12-04T12:12:57.7259107Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1580s] [ 0%] 2025-12-04T12:12:57.7259112Z 2025-12-04T12:12:57.7259289Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7259849Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7259970Z Traceback (most recent call last): 2025-12-04T12:12:57.7260432Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7260641Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7260850Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7260887Z 2025-12-04T12:12:57.7261110Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7262027Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7262032Z 2025-12-04T12:12:57.7262307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7262522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7262633Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7262760Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7263090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7263305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7263419Z graph_break [] 2025-12-04T12:12:57.7263634Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7264371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7264471Z warnings.warn( 2025-12-04T12:12:57.7265023Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7265159Z Traceback (most recent call last): 2025-12-04T12:12:57.7265618Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7265813Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7266030Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7266035Z 2025-12-04T12:12:57.7266242Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7267179Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7267186Z 2025-12-04T12:12:57.7267446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7267658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7267780Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7267892Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7268236Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7268531Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7268628Z graph_break [] 2025-12-04T12:12:57.7268854Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7269611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7269713Z warnings.warn( 2025-12-04T12:12:57.7269937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7270045Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7270175Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7270426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7270762Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7270876Z graph_break [] 2025-12-04T12:12:57.7271083Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7271794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7271949Z warnings.warn( 2025-12-04T12:12:57.7272091Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7272651Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7272771Z Traceback (most recent call last): 2025-12-04T12:12:57.7273235Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7273445Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7273655Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7273663Z 2025-12-04T12:12:57.7273871Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7274810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7274817Z 2025-12-04T12:12:57.7275078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7275305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7275416Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7275530Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7275878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7276092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7276207Z graph_break [] 2025-12-04T12:12:57.7276417Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7277130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7277250Z warnings.warn( 2025-12-04T12:12:57.7277462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7277573Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7277700Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7277915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7278257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7278356Z graph_break [] 2025-12-04T12:12:57.7278565Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7279337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7279436Z warnings.warn( 2025-12-04T12:12:57.7279647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7279805Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7279922Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7280151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7280477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7280574Z graph_break [] 2025-12-04T12:12:57.7280801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7281541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7281644Z warnings.warn( 2025-12-04T12:12:57.7282532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml - 2025-12-04T12:12:57.7282704Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7283826Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7283832Z 2025-12-04T12:12:57.7284044Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7284982Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7284989Z 2025-12-04T12:12:57.7285248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7285422Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7285627Z ================== 1 failed, 64 deselected, 2 rerun in 4.91s =================== 2025-12-04T12:12:57.7285728Z Got exit code 1 2025-12-04T12:12:57.7285835Z Retrying single test... 2025-12-04T12:12:57.7286479Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml 2025-12-04T12:12:57.7286636Z ============================= test session starts ============================== 2025-12-04T12:12:57.7286989Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7287100Z cachedir: .pytest_cache 2025-12-04T12:12:57.7287606Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7287741Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7287849Z configfile: pytest.ini 2025-12-04T12:12:57.7288436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7288663Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7289664Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7289790Z Running 1 items in this shard 2025-12-04T12:12:57.7289795Z 2025-12-04T12:12:57.7290687Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5461s] [100%] 2025-12-04T12:12:57.7291618Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1594s] [100%] 2025-12-04T12:12:57.7292465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1576s] [100%] 2025-12-04T12:12:57.7292473Z 2025-12-04T12:12:57.7292624Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7293166Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7293288Z Traceback (most recent call last): 2025-12-04T12:12:57.7293794Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7293989Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7294201Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7294206Z 2025-12-04T12:12:57.7294428Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7295353Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7295389Z 2025-12-04T12:12:57.7295664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7295880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7295991Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7296116Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7296449Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7296675Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7296772Z graph_break [] 2025-12-04T12:12:57.7296980Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7297713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7297814Z warnings.warn( 2025-12-04T12:12:57.7298360Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7298491Z Traceback (most recent call last): 2025-12-04T12:12:57.7298950Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7299157Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7299363Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7299369Z 2025-12-04T12:12:57.7299577Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7300506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7300515Z 2025-12-04T12:12:57.7300776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7301182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7301296Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7301411Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7301759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7301976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7302074Z graph_break [] 2025-12-04T12:12:57.7302300Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7303106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7303222Z warnings.warn( 2025-12-04T12:12:57.7303473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7303587Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7303715Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7303932Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7304260Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7304370Z graph_break [] 2025-12-04T12:12:57.7304631Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7305357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7305455Z warnings.warn( 2025-12-04T12:12:57.7305594Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7306155Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7306318Z Traceback (most recent call last): 2025-12-04T12:12:57.7306793Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7306988Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7307194Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7307199Z 2025-12-04T12:12:57.7307423Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7308353Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7308360Z 2025-12-04T12:12:57.7308639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7308854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7308967Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7309098Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7309428Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7309643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7309751Z graph_break [] 2025-12-04T12:12:57.7309966Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7310696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7310798Z warnings.warn( 2025-12-04T12:12:57.7311008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7311131Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7311243Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7311464Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7311803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7311899Z graph_break [] 2025-12-04T12:12:57.7312106Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7312832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7312930Z warnings.warn( 2025-12-04T12:12:57.7313152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7313294Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7313407Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7313637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7313993Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7314095Z graph_break [] 2025-12-04T12:12:57.7314316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7315026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7315135Z warnings.warn( 2025-12-04T12:12:57.7315961Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml - 2025-12-04T12:12:57.7316133Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7317189Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7317227Z 2025-12-04T12:12:57.7317439Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7318371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7318377Z 2025-12-04T12:12:57.7318636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7318823Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7319018Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.7319118Z Got exit code 1 2025-12-04T12:12:57.7319238Z Retrying single test... 2025-12-04T12:12:57.7319862Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml 2025-12-04T12:12:57.7320026Z ============================= test session starts ============================== 2025-12-04T12:12:57.7320377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7320482Z cachedir: .pytest_cache 2025-12-04T12:12:57.7321009Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7321131Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7321241Z configfile: pytest.ini 2025-12-04T12:12:57.7321831Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7322057Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7323151Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7323269Z Running 1 items in this shard 2025-12-04T12:12:57.7323274Z 2025-12-04T12:12:57.7324156Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5599s] [100%] 2025-12-04T12:12:57.7325059Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1637s] [100%] 2025-12-04T12:12:57.7325900Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1593s] [100%] 2025-12-04T12:12:57.7325905Z 2025-12-04T12:12:57.7326056Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7326638Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7326759Z Traceback (most recent call last): 2025-12-04T12:12:57.7327241Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7327434Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7327699Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7327705Z 2025-12-04T12:12:57.7327916Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7328840Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7328858Z 2025-12-04T12:12:57.7329117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7329369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7329492Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7329602Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7329933Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7330160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7330272Z graph_break [] 2025-12-04T12:12:57.7330483Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7331213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7331314Z warnings.warn( 2025-12-04T12:12:57.7331873Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7331999Z Traceback (most recent call last): 2025-12-04T12:12:57.7332459Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7332667Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7332873Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7332877Z 2025-12-04T12:12:57.7333105Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7334024Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7334031Z 2025-12-04T12:12:57.7334291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7334519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7334633Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7334762Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7335094Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7335311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7335423Z graph_break [] 2025-12-04T12:12:57.7335634Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7336350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7336502Z warnings.warn( 2025-12-04T12:12:57.7336716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7336841Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7336955Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7337205Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7337552Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7337649Z graph_break [] 2025-12-04T12:12:57.7337857Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7338585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7338712Z warnings.warn( 2025-12-04T12:12:57.7339144Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7340020Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7340145Z Traceback (most recent call last): 2025-12-04T12:12:57.7340632Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7340886Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7341097Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7341117Z 2025-12-04T12:12:57.7341337Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7342288Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7342294Z 2025-12-04T12:12:57.7342578Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7342801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7342914Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7343041Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7343383Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7343618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7343717Z graph_break [] 2025-12-04T12:12:57.7343933Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7344809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7344914Z warnings.warn( 2025-12-04T12:12:57.7345129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7345251Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7345367Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7345601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7345940Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7346037Z graph_break [] 2025-12-04T12:12:57.7346271Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7346998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7347102Z warnings.warn( 2025-12-04T12:12:57.7347334Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7347447Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7347578Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7347804Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7348186Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7348298Z graph_break [] 2025-12-04T12:12:57.7348517Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7349284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7349401Z warnings.warn( 2025-12-04T12:12:57.7350221Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml - 2025-12-04T12:12:57.7350407Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7351529Z FAILED [0.1593s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7351538Z 2025-12-04T12:12:57.7351759Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7352728Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7352765Z 2025-12-04T12:12:57.7353035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7353228Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7353428Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.7353534Z Got exit code 1 2025-12-04T12:12:57.7354411Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7354829Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7355479Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml 2025-12-04T12:12:57.7355772Z ============================= test session starts ============================== 2025-12-04T12:12:57.7356113Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7356234Z cachedir: .pytest_cache 2025-12-04T12:12:57.7356739Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7356872Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7356976Z configfile: pytest.ini 2025-12-04T12:12:57.7357549Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7357784Z collecting ... collected 380 items / 65 deselected / 315 selected 2025-12-04T12:12:57.7357923Z stepcurrent: skipping 65 already run items. 2025-12-04T12:12:57.7358033Z Running 110 items in this shard 2025-12-04T12:12:57.7358052Z 2025-12-04T12:12:57.7359053Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.7359940Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [ 1%] 2025-12-04T12:12:57.7360834Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1575s] [ 1%] 2025-12-04T12:12:57.7361668Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [ 1%] 2025-12-04T12:12:57.7361674Z 2025-12-04T12:12:57.7361867Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7362469Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7362593Z Traceback (most recent call last): 2025-12-04T12:12:57.7363071Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7363303Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7363526Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7363534Z 2025-12-04T12:12:57.7363743Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7364662Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7364713Z 2025-12-04T12:12:57.7364975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7365191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7365318Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7365429Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7365758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7365986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7366082Z graph_break [] 2025-12-04T12:12:57.7366295Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7367031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7367129Z warnings.warn( 2025-12-04T12:12:57.7367686Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7367807Z Traceback (most recent call last): 2025-12-04T12:12:57.7368271Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7368474Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7368678Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7368685Z 2025-12-04T12:12:57.7368905Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7369823Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7369831Z 2025-12-04T12:12:57.7370090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7370320Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7370432Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7370558Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7370888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7371101Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7371211Z graph_break [] 2025-12-04T12:12:57.7371423Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7372141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7372285Z warnings.warn( 2025-12-04T12:12:57.7372494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7372616Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7372781Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7372994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7373335Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7373430Z graph_break [] 2025-12-04T12:12:57.7373639Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7374391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7374489Z warnings.warn( 2025-12-04T12:12:57.7374643Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7375191Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7375309Z Traceback (most recent call last): 2025-12-04T12:12:57.7375815Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7376008Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7376214Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7376231Z 2025-12-04T12:12:57.7376437Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7377361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7377368Z 2025-12-04T12:12:57.7377638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7377851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7377960Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7378085Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7378417Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7378644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7378739Z graph_break [] 2025-12-04T12:12:57.7378947Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7379679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7379777Z warnings.warn( 2025-12-04T12:12:57.7379987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7380107Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7380218Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7380444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7380780Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7380877Z graph_break [] 2025-12-04T12:12:57.7381104Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7381816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7381916Z warnings.warn( 2025-12-04T12:12:57.7382140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7382248Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7382406Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7382622Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7382949Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7383057Z graph_break [] 2025-12-04T12:12:57.7383293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7384004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7384114Z warnings.warn( 2025-12-04T12:12:57.7384913Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml - 2025-12-04T12:12:57.7385122Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7386174Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7386182Z 2025-12-04T12:12:57.7386403Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7387356Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7387361Z 2025-12-04T12:12:57.7387621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7387810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7388020Z ============= 1 failed, 1 skipped, 65 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.7388128Z Got exit code 1 2025-12-04T12:12:57.7388233Z Retrying single test... 2025-12-04T12:12:57.7388862Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml 2025-12-04T12:12:57.7389035Z ============================= test session starts ============================== 2025-12-04T12:12:57.7389377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7389486Z cachedir: .pytest_cache 2025-12-04T12:12:57.7390005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7390126Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7390247Z configfile: pytest.ini 2025-12-04T12:12:57.7390832Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7391056Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7392074Z stepcurrent: skipping 66 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7392188Z Running 1 items in this shard 2025-12-04T12:12:57.7392195Z 2025-12-04T12:12:57.7393095Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5396s] [100%] 2025-12-04T12:12:57.7393978Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1596s] [100%] 2025-12-04T12:12:57.7394785Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1592s] [100%] 2025-12-04T12:12:57.7394834Z 2025-12-04T12:12:57.7394974Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7395523Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7395690Z Traceback (most recent call last): 2025-12-04T12:12:57.7396156Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7396349Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7396571Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7396576Z 2025-12-04T12:12:57.7396784Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7397754Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7397762Z 2025-12-04T12:12:57.7398024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7398239Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7398370Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7398515Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7398865Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7399080Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7399178Z graph_break [] 2025-12-04T12:12:57.7399406Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7400130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7400233Z warnings.warn( 2025-12-04T12:12:57.7400797Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7401097Z Traceback (most recent call last): 2025-12-04T12:12:57.7401580Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7401778Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7401987Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7401992Z 2025-12-04T12:12:57.7402268Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7403189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7403195Z 2025-12-04T12:12:57.7403473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7403683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7403794Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7403921Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7404255Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7404472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7404581Z graph_break [] 2025-12-04T12:12:57.7404791Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7405529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7405632Z warnings.warn( 2025-12-04T12:12:57.7405840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7406062Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7406172Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7406385Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7406728Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7406869Z graph_break [] 2025-12-04T12:12:57.7407092Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7407802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7407898Z warnings.warn( 2025-12-04T12:12:57.7408096Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7408641Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7408779Z Traceback (most recent call last): 2025-12-04T12:12:57.7409245Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7409437Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7409658Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7409702Z 2025-12-04T12:12:57.7409911Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7410829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7410850Z 2025-12-04T12:12:57.7411111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7411324Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7411450Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7411562Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7411892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7412118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7412216Z graph_break [] 2025-12-04T12:12:57.7412441Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7413161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7413258Z warnings.warn( 2025-12-04T12:12:57.7413478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7413588Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7413699Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7413926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7414256Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7414364Z graph_break [] 2025-12-04T12:12:57.7414574Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7415285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7415396Z warnings.warn( 2025-12-04T12:12:57.7415603Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7415710Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7415832Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7416046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7416384Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7416529Z graph_break [] 2025-12-04T12:12:57.7416736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7417454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7417662Z warnings.warn( 2025-12-04T12:12:57.7418466Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml - 2025-12-04T12:12:57.7418652Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7419737Z FAILED [0.1592s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7419744Z 2025-12-04T12:12:57.7419978Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7420900Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7420905Z 2025-12-04T12:12:57.7421217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7421395Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7421591Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.7421706Z Got exit code 1 2025-12-04T12:12:57.7421814Z Retrying single test... 2025-12-04T12:12:57.7422447Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml 2025-12-04T12:12:57.7422621Z ============================= test session starts ============================== 2025-12-04T12:12:57.7422966Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7423086Z cachedir: .pytest_cache 2025-12-04T12:12:57.7423592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7423718Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7423842Z configfile: pytest.ini 2025-12-04T12:12:57.7424413Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7424635Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7425652Z stepcurrent: skipping 66 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7425768Z Running 1 items in this shard 2025-12-04T12:12:57.7425773Z 2025-12-04T12:12:57.7426669Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5591s] [100%] 2025-12-04T12:12:57.7427555Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1620s] [100%] 2025-12-04T12:12:57.7428378Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1589s] [100%] 2025-12-04T12:12:57.7428383Z 2025-12-04T12:12:57.7428521Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7429061Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7429224Z Traceback (most recent call last): 2025-12-04T12:12:57.7429687Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7429895Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7430135Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7430141Z 2025-12-04T12:12:57.7430349Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7431287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7431321Z 2025-12-04T12:12:57.7431583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7431807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7431919Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7432030Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7432370Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7432585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7432710Z graph_break [] 2025-12-04T12:12:57.7432939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7433657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7433772Z warnings.warn( 2025-12-04T12:12:57.7434318Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7434436Z Traceback (most recent call last): 2025-12-04T12:12:57.7434907Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7435099Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7435315Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7435320Z 2025-12-04T12:12:57.7435532Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7436451Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7436456Z 2025-12-04T12:12:57.7436726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7436940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7437061Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7437174Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7437505Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7437732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7437826Z graph_break [] 2025-12-04T12:12:57.7438037Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7438770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7438866Z warnings.warn( 2025-12-04T12:12:57.7439090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7439197Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7439310Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7439537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7439898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7439993Z graph_break [] 2025-12-04T12:12:57.7440217Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7440964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7441081Z warnings.warn( 2025-12-04T12:12:57.7441224Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7441767Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7441901Z Traceback (most recent call last): 2025-12-04T12:12:57.7442460Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7442655Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7442882Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7442887Z 2025-12-04T12:12:57.7443096Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7444034Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7444070Z 2025-12-04T12:12:57.7444331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7444543Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7444664Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7444779Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7445123Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7445335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7445431Z graph_break [] 2025-12-04T12:12:57.7445652Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7446372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7446473Z warnings.warn( 2025-12-04T12:12:57.7446693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7446803Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7446927Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7447139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7447470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7447577Z graph_break [] 2025-12-04T12:12:57.7447796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7448509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7448619Z warnings.warn( 2025-12-04T12:12:57.7448829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7448955Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7449066Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7449284Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7449626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7449722Z graph_break [] 2025-12-04T12:12:57.7449933Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7450658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7450789Z warnings.warn( 2025-12-04T12:12:57.7451605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml - 2025-12-04T12:12:57.7451802Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7453004Z FAILED [0.1589s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7453025Z 2025-12-04T12:12:57.7453239Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7454202Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7454210Z 2025-12-04T12:12:57.7454488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7454664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7454874Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.7455027Z Got exit code 1 2025-12-04T12:12:57.7455870Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7456290Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7456922Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml 2025-12-04T12:12:57.7457081Z ============================= test session starts ============================== 2025-12-04T12:12:57.7457442Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7457551Z cachedir: .pytest_cache 2025-12-04T12:12:57.7458077Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7458201Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7458307Z configfile: pytest.ini 2025-12-04T12:12:57.7458892Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7459114Z collecting ... collected 380 items / 67 deselected / 313 selected 2025-12-04T12:12:57.7459267Z stepcurrent: skipping 67 already run items. 2025-12-04T12:12:57.7459380Z Running 108 items in this shard 2025-12-04T12:12:57.7459385Z 2025-12-04T12:12:57.7460387Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.7461400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0032s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.7462285Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5539s] [ 2%] 2025-12-04T12:12:57.7463173Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [ 2%] 2025-12-04T12:12:57.7463971Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1565s] [ 2%] 2025-12-04T12:12:57.7464011Z 2025-12-04T12:12:57.7464169Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7464736Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7464859Z Traceback (most recent call last): 2025-12-04T12:12:57.7465334Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7465532Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7465738Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7465785Z 2025-12-04T12:12:57.7465996Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7466912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7466919Z 2025-12-04T12:12:57.7467188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7467403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7467543Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7467668Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7467998Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7468224Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7468319Z graph_break [] 2025-12-04T12:12:57.7468531Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7469262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7469366Z warnings.warn( 2025-12-04T12:12:57.7469909Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7470042Z Traceback (most recent call last): 2025-12-04T12:12:57.7470503Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7470710Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7470914Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7470919Z 2025-12-04T12:12:57.7471126Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7472057Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7472064Z 2025-12-04T12:12:57.7472324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7472557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7472665Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7472781Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7473124Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7473336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7473446Z graph_break [] 2025-12-04T12:12:57.7473657Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7474378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7474527Z warnings.warn( 2025-12-04T12:12:57.7474735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7474841Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7474966Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7475179Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7475549Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7475648Z graph_break [] 2025-12-04T12:12:57.7475858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7476582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7476709Z warnings.warn( 2025-12-04T12:12:57.7476851Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7477403Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7477522Z Traceback (most recent call last): 2025-12-04T12:12:57.7477993Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7478190Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7478432Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7478437Z 2025-12-04T12:12:57.7478657Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7479574Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7479580Z 2025-12-04T12:12:57.7479855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7480065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7480174Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7480301Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7480632Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7480849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7480958Z graph_break [] 2025-12-04T12:12:57.7481166Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7481896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7481994Z warnings.warn( 2025-12-04T12:12:57.7482273Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7482398Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7482513Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7482729Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7483071Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7483167Z graph_break [] 2025-12-04T12:12:57.7483393Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7484109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7484207Z warnings.warn( 2025-12-04T12:12:57.7484431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7484539Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7484652Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7484883Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7485253Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7485361Z graph_break [] 2025-12-04T12:12:57.7485570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7486309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7486424Z warnings.warn( 2025-12-04T12:12:57.7487222Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml - 2025-12-04T12:12:57.7487388Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7488481Z FAILED [0.1565s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7488489Z 2025-12-04T12:12:57.7488703Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7489636Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7489670Z 2025-12-04T12:12:57.7489932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7490122Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7490331Z ============= 1 failed, 2 skipped, 67 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:57.7490430Z Got exit code 1 2025-12-04T12:12:57.7490548Z Retrying single test... 2025-12-04T12:12:57.7491172Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml 2025-12-04T12:12:57.7491333Z ============================= test session starts ============================== 2025-12-04T12:12:57.7491687Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7491793Z cachedir: .pytest_cache 2025-12-04T12:12:57.7492311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7492435Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7492540Z configfile: pytest.ini 2025-12-04T12:12:57.7493131Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7493355Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7494367Z stepcurrent: skipping 69 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7494480Z Running 1 items in this shard 2025-12-04T12:12:57.7494485Z 2025-12-04T12:12:57.7495370Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5628s] [100%] 2025-12-04T12:12:57.7496262Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1626s] [100%] 2025-12-04T12:12:57.7497059Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1576s] [100%] 2025-12-04T12:12:57.7497065Z 2025-12-04T12:12:57.7497213Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7497793Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7497924Z Traceback (most recent call last): 2025-12-04T12:12:57.7498420Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7498616Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7498834Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7498839Z 2025-12-04T12:12:57.7499051Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7500012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7500031Z 2025-12-04T12:12:57.7500291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7500505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7500627Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7500739Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7501247Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7501550Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7501647Z graph_break [] 2025-12-04T12:12:57.7501873Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7502599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7502700Z warnings.warn( 2025-12-04T12:12:57.7503256Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7503379Z Traceback (most recent call last): 2025-12-04T12:12:57.7503835Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7504045Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7504256Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7504261Z 2025-12-04T12:12:57.7504484Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7505401Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7505407Z 2025-12-04T12:12:57.7505667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7505892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7506004Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7506133Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7506464Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7506679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7506792Z graph_break [] 2025-12-04T12:12:57.7507001Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7507716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7507827Z warnings.warn( 2025-12-04T12:12:57.7508037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7508158Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7508271Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7508540Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7508879Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7508979Z graph_break [] 2025-12-04T12:12:57.7509188Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7509957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7510058Z warnings.warn( 2025-12-04T12:12:57.7510212Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7510752Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7510914Z Traceback (most recent call last): 2025-12-04T12:12:57.7511388Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7511586Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7511792Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7511811Z 2025-12-04T12:12:57.7512018Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7512967Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7512972Z 2025-12-04T12:12:57.7513249Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7513462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7513587Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7513702Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7514034Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7514266Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7514362Z graph_break [] 2025-12-04T12:12:57.7514575Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7515304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7515407Z warnings.warn( 2025-12-04T12:12:57.7515631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7515741Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7515856Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7516089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7516420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7516521Z graph_break [] 2025-12-04T12:12:57.7516748Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7517463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7517580Z warnings.warn( 2025-12-04T12:12:57.7517788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7517896Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7518021Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7518237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7518566Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7518681Z graph_break [] 2025-12-04T12:12:57.7518891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7519643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7519753Z warnings.warn( 2025-12-04T12:12:57.7520585Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml - 2025-12-04T12:12:57.7520770Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7521816Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7521822Z 2025-12-04T12:12:57.7522074Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7523048Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7523057Z 2025-12-04T12:12:57.7523322Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7523512Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7523747Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.7523858Z Got exit code 1 2025-12-04T12:12:57.7523963Z Retrying single test... 2025-12-04T12:12:57.7524594Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml 2025-12-04T12:12:57.7524772Z ============================= test session starts ============================== 2025-12-04T12:12:57.7525120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7525228Z cachedir: .pytest_cache 2025-12-04T12:12:57.7525751Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7525876Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7525995Z configfile: pytest.ini 2025-12-04T12:12:57.7526577Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7526803Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7527815Z stepcurrent: skipping 69 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7527933Z Running 1 items in this shard 2025-12-04T12:12:57.7527938Z 2025-12-04T12:12:57.7528832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5446s] [100%] 2025-12-04T12:12:57.7529716Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%] 2025-12-04T12:12:57.7530520Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1568s] [100%] 2025-12-04T12:12:57.7530540Z 2025-12-04T12:12:57.7530676Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7531219Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7531350Z Traceback (most recent call last): 2025-12-04T12:12:57.7531849Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7532044Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7532262Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7532267Z 2025-12-04T12:12:57.7532500Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7533431Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7533436Z 2025-12-04T12:12:57.7533694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7533937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7534063Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7534174Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7534519Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7534734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7534828Z graph_break [] 2025-12-04T12:12:57.7535053Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7535800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7535901Z warnings.warn( 2025-12-04T12:12:57.7536450Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7536568Z Traceback (most recent call last): 2025-12-04T12:12:57.7537040Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7537233Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7537442Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7537447Z 2025-12-04T12:12:57.7537666Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7538581Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7538588Z 2025-12-04T12:12:57.7538859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7539072Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7539181Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7539309Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7539639Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7539869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7539963Z graph_break [] 2025-12-04T12:12:57.7540174Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7540908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7541008Z warnings.warn( 2025-12-04T12:12:57.7541217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7541335Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7541447Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7541662Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7542005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7542101Z graph_break [] 2025-12-04T12:12:57.7542323Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7543066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7543164Z warnings.warn( 2025-12-04T12:12:57.7543362Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7543906Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7544040Z Traceback (most recent call last): 2025-12-04T12:12:57.7544502Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7544728Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7544948Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7544953Z 2025-12-04T12:12:57.7545164Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7546079Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7546098Z 2025-12-04T12:12:57.7546395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7546605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7546726Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7546837Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7547165Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7547392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7547488Z graph_break [] 2025-12-04T12:12:57.7547711Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7548430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7548529Z warnings.warn( 2025-12-04T12:12:57.7548753Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7548862Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7548972Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7549194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7549523Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7549633Z graph_break [] 2025-12-04T12:12:57.7549844Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7550551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7550663Z warnings.warn( 2025-12-04T12:12:57.7550872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7550980Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7551103Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7551321Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7551660Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7551754Z graph_break [] 2025-12-04T12:12:57.7551961Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7552681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7552780Z warnings.warn( 2025-12-04T12:12:57.7553578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml - 2025-12-04T12:12:57.7553792Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7554867Z FAILED [0.1568s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7554875Z 2025-12-04T12:12:57.7555098Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7556038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7556043Z 2025-12-04T12:12:57.7556317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7556496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7556689Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.7556797Z Got exit code 1 2025-12-04T12:12:57.7557633Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7558076Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7558704Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml 2025-12-04T12:12:57.7558867Z ============================= test session starts ============================== 2025-12-04T12:12:57.7559224Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7559331Z cachedir: .pytest_cache 2025-12-04T12:12:57.7559838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7559972Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7560083Z configfile: pytest.ini 2025-12-04T12:12:57.7560677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7560898Z collecting ... collected 380 items / 70 deselected / 310 selected 2025-12-04T12:12:57.7561042Z stepcurrent: skipping 70 already run items. 2025-12-04T12:12:57.7561169Z Running 105 items in this shard 2025-12-04T12:12:57.7561174Z 2025-12-04T12:12:57.7562061Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5778s] [ 0%] 2025-12-04T12:12:57.7563017Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1616s] [ 0%] 2025-12-04T12:12:57.7563825Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1604s] [ 0%] 2025-12-04T12:12:57.7563833Z 2025-12-04T12:12:57.7563969Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7564520Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7564637Z Traceback (most recent call last): 2025-12-04T12:12:57.7565118Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7565314Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7565560Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7565565Z 2025-12-04T12:12:57.7565790Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7566736Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7566743Z 2025-12-04T12:12:57.7567019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7567233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7567345Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7567500Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7567832Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7568062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7568158Z graph_break [] 2025-12-04T12:12:57.7568368Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7569112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7569242Z warnings.warn( 2025-12-04T12:12:57.7569781Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7569911Z Traceback (most recent call last): 2025-12-04T12:12:57.7570374Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7570585Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7570791Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7570798Z 2025-12-04T12:12:57.7571009Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7571937Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7571948Z 2025-12-04T12:12:57.7572210Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7572436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7572544Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7572659Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7573006Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7573223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7573321Z graph_break [] 2025-12-04T12:12:57.7573551Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7574270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7574385Z warnings.warn( 2025-12-04T12:12:57.7574598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7574708Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7574836Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7575050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7575380Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7575495Z graph_break [] 2025-12-04T12:12:57.7575710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7576435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7576570Z warnings.warn( 2025-12-04T12:12:57.7576710Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7577295Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7577417Z Traceback (most recent call last): 2025-12-04T12:12:57.7577879Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7578087Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7578293Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7578298Z 2025-12-04T12:12:57.7578550Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7579468Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7579475Z 2025-12-04T12:12:57.7579750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7579965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7580105Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7580235Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7580569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7580786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7580897Z graph_break [] 2025-12-04T12:12:57.7581111Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7581824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7581940Z warnings.warn( 2025-12-04T12:12:57.7582148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7582268Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7582380Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7582600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7582938Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7583031Z graph_break [] 2025-12-04T12:12:57.7583237Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7583960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7584059Z warnings.warn( 2025-12-04T12:12:57.7584280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7584389Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7584498Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7584722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7585049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7585147Z graph_break [] 2025-12-04T12:12:57.7585364Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7586076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7586188Z warnings.warn( 2025-12-04T12:12:57.7586992Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml - 2025-12-04T12:12:57.7587208Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7588262Z FAILED [0.1604s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7588300Z 2025-12-04T12:12:57.7588512Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7589447Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7589452Z 2025-12-04T12:12:57.7589746Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7589934Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7590127Z ================== 1 failed, 70 deselected, 2 rerun in 4.95s =================== 2025-12-04T12:12:57.7590226Z Got exit code 1 2025-12-04T12:12:57.7590346Z Retrying single test... 2025-12-04T12:12:57.7590973Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml 2025-12-04T12:12:57.7591161Z ============================= test session starts ============================== 2025-12-04T12:12:57.7591514Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7591619Z cachedir: .pytest_cache 2025-12-04T12:12:57.7592137Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7592258Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7592367Z configfile: pytest.ini 2025-12-04T12:12:57.7592956Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7593180Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7594172Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7594300Z Running 1 items in this shard 2025-12-04T12:12:57.7594305Z 2025-12-04T12:12:57.7595183Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5346s] [100%] 2025-12-04T12:12:57.7596073Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%] 2025-12-04T12:12:57.7596869Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1567s] [100%] 2025-12-04T12:12:57.7596874Z 2025-12-04T12:12:57.7597024Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7597563Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7597684Z Traceback (most recent call last): 2025-12-04T12:12:57.7598161Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7598354Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7598577Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7598583Z 2025-12-04T12:12:57.7598791Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7599818Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7599823Z 2025-12-04T12:12:57.7600103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7600355Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7600481Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7600596Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7601090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7601326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7601425Z graph_break [] 2025-12-04T12:12:57.7601703Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7602527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7602632Z warnings.warn( 2025-12-04T12:12:57.7603187Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7603356Z Traceback (most recent call last): 2025-12-04T12:12:57.7603825Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7604032Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7604240Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7604245Z 2025-12-04T12:12:57.7604470Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7605382Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7605390Z 2025-12-04T12:12:57.7605650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7605874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7605989Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7612585Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7613018Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7613243Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7613357Z graph_break [] 2025-12-04T12:12:57.7613579Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7614322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7614442Z warnings.warn( 2025-12-04T12:12:57.7614656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7614767Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7614893Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7615112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7615464Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7615562Z graph_break [] 2025-12-04T12:12:57.7615772Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7616507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7616607Z warnings.warn( 2025-12-04T12:12:57.7616750Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7617430Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7617550Z Traceback (most recent call last): 2025-12-04T12:12:57.7618027Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7618270Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7618481Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7618490Z 2025-12-04T12:12:57.7618714Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7619670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7619677Z 2025-12-04T12:12:57.7619962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7620180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7620293Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7620411Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7620744Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7621000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7621109Z graph_break [] 2025-12-04T12:12:57.7621319Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7622046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7622146Z warnings.warn( 2025-12-04T12:12:57.7622354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7622474Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7622586Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7622799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7623142Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7623237Z graph_break [] 2025-12-04T12:12:57.7623464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7624183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7624284Z warnings.warn( 2025-12-04T12:12:57.7624506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7624615Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7624725Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7624952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7625280Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7625388Z graph_break [] 2025-12-04T12:12:57.7625595Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7626304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7626418Z warnings.warn( 2025-12-04T12:12:57.7627217Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml - 2025-12-04T12:12:57.7627385Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7628454Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7628491Z 2025-12-04T12:12:57.7628705Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7629667Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7629675Z 2025-12-04T12:12:57.7629937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7630120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7630313Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.7630411Z Got exit code 1 2025-12-04T12:12:57.7630557Z Retrying single test... 2025-12-04T12:12:57.7631186Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml 2025-12-04T12:12:57.7631348Z ============================= test session starts ============================== 2025-12-04T12:12:57.7631701Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7631807Z cachedir: .pytest_cache 2025-12-04T12:12:57.7632381Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7632500Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7632606Z configfile: pytest.ini 2025-12-04T12:12:57.7633196Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7633418Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7634427Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7634544Z Running 1 items in this shard 2025-12-04T12:12:57.7634549Z 2025-12-04T12:12:57.7635438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5494s] [100%] 2025-12-04T12:12:57.7636338Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1614s] [100%] 2025-12-04T12:12:57.7637140Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1574s] [100%] 2025-12-04T12:12:57.7637146Z 2025-12-04T12:12:57.7637300Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7637841Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7637975Z Traceback (most recent call last): 2025-12-04T12:12:57.7638436Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7638630Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7638849Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7638854Z 2025-12-04T12:12:57.7639064Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7639979Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7639996Z 2025-12-04T12:12:57.7640288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7640505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7640628Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7640740Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7641105Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7641337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7641436Z graph_break [] 2025-12-04T12:12:57.7641659Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7642497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7642601Z warnings.warn( 2025-12-04T12:12:57.7643154Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7643277Z Traceback (most recent call last): 2025-12-04T12:12:57.7643741Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7643952Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7644194Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7644199Z 2025-12-04T12:12:57.7644426Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7645344Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7645349Z 2025-12-04T12:12:57.7645615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7645842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7645957Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7646085Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7646415Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7646631Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7646745Z graph_break [] 2025-12-04T12:12:57.7646955Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7647675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7647786Z warnings.warn( 2025-12-04T12:12:57.7647998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7648123Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7648236Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7648451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7648791Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7648886Z graph_break [] 2025-12-04T12:12:57.7649094Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7649818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7649917Z warnings.warn( 2025-12-04T12:12:57.7650071Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7650615Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7650733Z Traceback (most recent call last): 2025-12-04T12:12:57.7651207Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7651433Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7651639Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7651657Z 2025-12-04T12:12:57.7651864Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7652811Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7652817Z 2025-12-04T12:12:57.7653085Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7653325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7653449Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7653560Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7653892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7654122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7654218Z graph_break [] 2025-12-04T12:12:57.7654430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7655192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7655294Z warnings.warn( 2025-12-04T12:12:57.7655501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7655622Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7655731Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7655959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7656289Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7656386Z graph_break [] 2025-12-04T12:12:57.7656610Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7657320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7657420Z warnings.warn( 2025-12-04T12:12:57.7657643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7657751Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7657872Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7658085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7658413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7658518Z graph_break [] 2025-12-04T12:12:57.7658724Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7659434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7659542Z warnings.warn( 2025-12-04T12:12:57.7660343Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml - 2025-12-04T12:12:57.7660522Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7661568Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7661576Z 2025-12-04T12:12:57.7661799Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7662715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7662766Z 2025-12-04T12:12:57.7663029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7663244Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7663440Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.7663552Z Got exit code 1 2025-12-04T12:12:57.7664375Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7664807Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7665446Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml 2025-12-04T12:12:57.7665608Z ============================= test session starts ============================== 2025-12-04T12:12:57.7665962Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7666069Z cachedir: .pytest_cache 2025-12-04T12:12:57.7666609Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7666744Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7666849Z configfile: pytest.ini 2025-12-04T12:12:57.7667426Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7667662Z collecting ... collected 380 items / 71 deselected / 309 selected 2025-12-04T12:12:57.7667801Z stepcurrent: skipping 71 already run items. 2025-12-04T12:12:57.7667927Z Running 104 items in this shard 2025-12-04T12:12:57.7667931Z 2025-12-04T12:12:57.7668814Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5602s] [ 0%] 2025-12-04T12:12:57.7669687Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1665s] [ 0%] 2025-12-04T12:12:57.7670494Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1618s] [ 0%] 2025-12-04T12:12:57.7670500Z 2025-12-04T12:12:57.7670639Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7671185Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7671307Z Traceback (most recent call last): 2025-12-04T12:12:57.7671770Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7671976Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7672187Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7672192Z 2025-12-04T12:12:57.7672415Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7673328Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7673333Z 2025-12-04T12:12:57.7673592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7673819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7673963Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7674090Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7674423Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7674639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7674779Z graph_break [] 2025-12-04T12:12:57.7674991Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7677699Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7677809Z return x.grad, w.grad 2025-12-04T12:12:57.7678523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7678671Z warnings.warn( 2025-12-04T12:12:57.7681304Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7681424Z return x.grad, w.grad 2025-12-04T12:12:57.7681955Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7682089Z Traceback (most recent call last): 2025-12-04T12:12:57.7682618Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7682817Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7683038Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7683044Z 2025-12-04T12:12:57.7683254Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7684180Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7684188Z 2025-12-04T12:12:57.7684444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7684652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7684775Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7684884Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7685219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7685446Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7685541Z graph_break [] 2025-12-04T12:12:57.7685762Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7688416Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7688570Z return x.grad, w.grad 2025-12-04T12:12:57.7689317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7689417Z warnings.warn( 2025-12-04T12:12:57.7692088Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7692194Z return x.grad, w.grad 2025-12-04T12:12:57.7692422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7692560Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7692672Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7692905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7693234Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7693346Z graph_break [] 2025-12-04T12:12:57.7693560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7696200Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7696322Z return x.grad, w.grad 2025-12-04T12:12:57.7697035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7697144Z warnings.warn( 2025-12-04T12:12:57.7699805Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7699927Z return x.grad, w.grad 2025-12-04T12:12:57.7700068Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7700602Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7700733Z Traceback (most recent call last): 2025-12-04T12:12:57.7701486Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7701696Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7701980Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7701985Z 2025-12-04T12:12:57.7702195Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7703157Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7703167Z 2025-12-04T12:12:57.7703430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7703652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7703761Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7703874Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7704261Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7704480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7704580Z graph_break [] 2025-12-04T12:12:57.7704801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7707454Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7707616Z return x.grad, w.grad 2025-12-04T12:12:57.7708337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7708454Z warnings.warn( 2025-12-04T12:12:57.7711088Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7711211Z return x.grad, w.grad 2025-12-04T12:12:57.7711426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7711534Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7711659Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7711877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7712214Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7712325Z graph_break [] 2025-12-04T12:12:57.7712537Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7715205Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7715307Z return x.grad, w.grad 2025-12-04T12:12:57.7716063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7716160Z warnings.warn( 2025-12-04T12:12:57.7718852Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7718970Z return x.grad, w.grad 2025-12-04T12:12:57.7719183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7719307Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7719417Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7719633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7719975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7720104Z graph_break [] 2025-12-04T12:12:57.7720328Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7721042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7721139Z warnings.warn( 2025-12-04T12:12:57.7723859Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7723973Z return x.grad, w.grad 2025-12-04T12:12:57.7724791Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml - 2025-12-04T12:12:57.7724962Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7726002Z FAILED [0.1618s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7726025Z 2025-12-04T12:12:57.7726237Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7727147Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7727155Z 2025-12-04T12:12:57.7727436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7727613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7727819Z ================== 1 failed, 71 deselected, 2 rerun in 4.94s =================== 2025-12-04T12:12:57.7727916Z Got exit code 1 2025-12-04T12:12:57.7728019Z Retrying single test... 2025-12-04T12:12:57.7728658Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml 2025-12-04T12:12:57.7728817Z ============================= test session starts ============================== 2025-12-04T12:12:57.7729199Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7729316Z cachedir: .pytest_cache 2025-12-04T12:12:57.7729867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7730004Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7730108Z configfile: pytest.ini 2025-12-04T12:12:57.7730682Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7730918Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7731938Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7732064Z Running 1 items in this shard 2025-12-04T12:12:57.7732069Z 2025-12-04T12:12:57.7732939Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5479s] [100%] 2025-12-04T12:12:57.7733842Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1655s] [100%] 2025-12-04T12:12:57.7734646Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1612s] [100%] 2025-12-04T12:12:57.7734652Z 2025-12-04T12:12:57.7734790Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7735339Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7735458Z Traceback (most recent call last): 2025-12-04T12:12:57.7735920Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7736127Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7736337Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7736342Z 2025-12-04T12:12:57.7736563Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7737474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7737482Z 2025-12-04T12:12:57.7737757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7737970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7738085Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7738210Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7738541Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7738761Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7738871Z graph_break [] 2025-12-04T12:12:57.7739079Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7741736Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7741875Z return x.grad, w.grad 2025-12-04T12:12:57.7742620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7742732Z warnings.warn( 2025-12-04T12:12:57.7745397Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7745519Z return x.grad, w.grad 2025-12-04T12:12:57.7746053Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7746184Z Traceback (most recent call last): 2025-12-04T12:12:57.7746673Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7746866Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7747084Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7747089Z 2025-12-04T12:12:57.7747297Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7748221Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7748229Z 2025-12-04T12:12:57.7748489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7748701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7748823Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7748938Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7749282Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7749498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7749596Z graph_break [] 2025-12-04T12:12:57.7749818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7752464Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7752588Z return x.grad, w.grad 2025-12-04T12:12:57.7753297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7753396Z warnings.warn( 2025-12-04T12:12:57.7756051Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7756190Z return x.grad, w.grad 2025-12-04T12:12:57.7756444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7756555Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7756667Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7756898Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7757229Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7757339Z graph_break [] 2025-12-04T12:12:57.7757578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7760228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7760380Z return x.grad, w.grad 2025-12-04T12:12:57.7761097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7761209Z warnings.warn( 2025-12-04T12:12:57.7763908Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7764035Z return x.grad, w.grad 2025-12-04T12:12:57.7764179Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7764736Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7764857Z Traceback (most recent call last): 2025-12-04T12:12:57.7765319Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7765531Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7765744Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7765750Z 2025-12-04T12:12:57.7765974Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7766893Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7766901Z 2025-12-04T12:12:57.7767162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7767383Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7767493Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7767601Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7767944Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7768161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7768301Z graph_break [] 2025-12-04T12:12:57.7768515Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7771195Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7771340Z return x.grad, w.grad 2025-12-04T12:12:57.7772056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7772165Z warnings.warn( 2025-12-04T12:12:57.7774921Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7775087Z return x.grad, w.grad 2025-12-04T12:12:57.7775305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7775412Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7775538Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7775755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7776097Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7776193Z graph_break [] 2025-12-04T12:12:57.7776401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7779057Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7779162Z return x.grad, w.grad 2025-12-04T12:12:57.7779890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7779986Z warnings.warn( 2025-12-04T12:12:57.7782638Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7782743Z return x.grad, w.grad 2025-12-04T12:12:57.7782952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7783130Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7783242Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7783470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7783836Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7783934Z graph_break [] 2025-12-04T12:12:57.7784157Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7784875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7784972Z warnings.warn( 2025-12-04T12:12:57.7787686Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7787819Z return x.grad, w.grad 2025-12-04T12:12:57.7788639Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml - 2025-12-04T12:12:57.7788805Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7789857Z FAILED [0.1612s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7789865Z 2025-12-04T12:12:57.7790074Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7790999Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7791006Z 2025-12-04T12:12:57.7791265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7791440Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7791640Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.7791734Z Got exit code 1 2025-12-04T12:12:57.7791839Z Retrying single test... 2025-12-04T12:12:57.7792470Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml 2025-12-04T12:12:57.7792628Z ============================= test session starts ============================== 2025-12-04T12:12:57.7792978Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7793082Z cachedir: .pytest_cache 2025-12-04T12:12:57.7793589Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7793722Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7793827Z configfile: pytest.ini 2025-12-04T12:12:57.7794405Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7794641Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7795631Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7795801Z Running 1 items in this shard 2025-12-04T12:12:57.7795806Z 2025-12-04T12:12:57.7796717Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5620s] [100%] 2025-12-04T12:12:57.7797602Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1681s] [100%] 2025-12-04T12:12:57.7798400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1648s] [100%] 2025-12-04T12:12:57.7798435Z 2025-12-04T12:12:57.7798574Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7799121Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7799240Z Traceback (most recent call last): 2025-12-04T12:12:57.7799715Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7800017Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7800225Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7800230Z 2025-12-04T12:12:57.7800448Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7801519Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7801525Z 2025-12-04T12:12:57.7801794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7802007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7802180Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7802309Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7802641Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7802871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7802967Z graph_break [] 2025-12-04T12:12:57.7803176Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7805839Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7805945Z return x.grad, w.grad 2025-12-04T12:12:57.7806678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7806778Z warnings.warn( 2025-12-04T12:12:57.7809427Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7809611Z return x.grad, w.grad 2025-12-04T12:12:57.7810144Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7810322Z Traceback (most recent call last): 2025-12-04T12:12:57.7810781Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7810989Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7811194Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7811201Z 2025-12-04T12:12:57.7811412Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7812380Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7812388Z 2025-12-04T12:12:57.7812649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7812876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7812984Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7813140Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7813487Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7813696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7813790Z graph_break [] 2025-12-04T12:12:57.7814010Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7816654Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7816774Z return x.grad, w.grad 2025-12-04T12:12:57.7817491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7817604Z warnings.warn( 2025-12-04T12:12:57.7820237Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7820354Z return x.grad, w.grad 2025-12-04T12:12:57.7820570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7820680Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7820802Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7821020Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7821350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7821457Z graph_break [] 2025-12-04T12:12:57.7821668Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7824342Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7824477Z return x.grad, w.grad 2025-12-04T12:12:57.7825205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7825331Z warnings.warn( 2025-12-04T12:12:57.7827965Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7828123Z return x.grad, w.grad 2025-12-04T12:12:57.7828263Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7828807Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.7828927Z Traceback (most recent call last): 2025-12-04T12:12:57.7829386Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7829591Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7829797Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7829802Z 2025-12-04T12:12:57.7830020Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7830938Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7830946Z 2025-12-04T12:12:57.7831204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7831425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7831533Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7831658Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7831988Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7832204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7832310Z graph_break [] 2025-12-04T12:12:57.7832521Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7835179Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7835282Z return x.grad, w.grad 2025-12-04T12:12:57.7835991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7836134Z warnings.warn( 2025-12-04T12:12:57.7838793Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7838908Z return x.grad, w.grad 2025-12-04T12:12:57.7839151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7839271Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7839383Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7839602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7839942Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7840037Z graph_break [] 2025-12-04T12:12:57.7840247Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7842990Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7843094Z return x.grad, w.grad 2025-12-04T12:12:57.7843817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7843914Z warnings.warn( 2025-12-04T12:12:57.7846567Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7846670Z return x.grad, w.grad 2025-12-04T12:12:57.7846884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7847005Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7847116Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7847347Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7847677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7847774Z graph_break [] 2025-12-04T12:12:57.7847992Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7848703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7848797Z warnings.warn( 2025-12-04T12:12:57.7851490Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.7851625Z return x.grad, w.grad 2025-12-04T12:12:57.7852434Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml - 2025-12-04T12:12:57.7852598Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7853680Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7853689Z 2025-12-04T12:12:57.7853901Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7854822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7854858Z 2025-12-04T12:12:57.7855116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7855286Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7855492Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.7855586Z Got exit code 1 2025-12-04T12:12:57.7856423Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.7856826Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7857448Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml 2025-12-04T12:12:57.7857625Z ============================= test session starts ============================== 2025-12-04T12:12:57.7857968Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7858073Z cachedir: .pytest_cache 2025-12-04T12:12:57.7858593Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7858716Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7858835Z configfile: pytest.ini 2025-12-04T12:12:57.7859412Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7859631Z collecting ... collected 380 items / 72 deselected / 308 selected 2025-12-04T12:12:57.7859783Z stepcurrent: skipping 72 already run items. 2025-12-04T12:12:57.7859894Z Running 103 items in this shard 2025-12-04T12:12:57.7859898Z 2025-12-04T12:12:57.7860906Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 0%] 2025-12-04T12:12:57.7861899Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.7862875Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0035s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.7863797Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5402s] [ 3%] 2025-12-04T12:12:57.7864699Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1601s] [ 3%] 2025-12-04T12:12:57.7865510Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1587s] [ 3%] 2025-12-04T12:12:57.7865515Z 2025-12-04T12:12:57.7865681Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7866231Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7866353Z Traceback (most recent call last): 2025-12-04T12:12:57.7866813Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7867019Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7867257Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7867262Z 2025-12-04T12:12:57.7867483Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7868412Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7868417Z 2025-12-04T12:12:57.7868685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7868912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7869023Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7869147Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7869479Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7869696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7869804Z graph_break [] 2025-12-04T12:12:57.7870013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7870729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7870841Z warnings.warn( 2025-12-04T12:12:57.7871381Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7871512Z Traceback (most recent call last): 2025-12-04T12:12:57.7871970Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7872166Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7872384Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7872389Z 2025-12-04T12:12:57.7872600Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7873532Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7873537Z 2025-12-04T12:12:57.7873800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7874012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7874129Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7874242Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7874621Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7874847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7874945Z graph_break [] 2025-12-04T12:12:57.7875193Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7875909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7876007Z warnings.warn( 2025-12-04T12:12:57.7876223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7876329Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7876485Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7876714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7877044Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7877150Z graph_break [] 2025-12-04T12:12:57.7877355Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7878064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7878206Z warnings.warn( 2025-12-04T12:12:57.7878345Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7878899Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7879016Z Traceback (most recent call last): 2025-12-04T12:12:57.7879476Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7879675Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7879881Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7879886Z 2025-12-04T12:12:57.7880092Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7881020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7881028Z 2025-12-04T12:12:57.7881287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7881509Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7881623Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7881736Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7882080Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7882362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7882475Z graph_break [] 2025-12-04T12:12:57.7882685Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7883405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7883519Z warnings.warn( 2025-12-04T12:12:57.7883728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7883834Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7883956Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7884169Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7884513Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7884612Z graph_break [] 2025-12-04T12:12:57.7884821Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7885590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7885687Z warnings.warn( 2025-12-04T12:12:57.7885923Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7886048Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7886161Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7886377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7886717Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7886811Z graph_break [] 2025-12-04T12:12:57.7887062Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7887773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7887872Z warnings.warn( 2025-12-04T12:12:57.7888675Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml - 2025-12-04T12:12:57.7888842Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7889932Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7889938Z 2025-12-04T12:12:57.7890152Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7891066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7891086Z 2025-12-04T12:12:57.7891344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7891519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7891743Z ============= 1 failed, 3 skipped, 72 deselected, 2 rerun in 4.92s ============= 2025-12-04T12:12:57.7891841Z Got exit code 1 2025-12-04T12:12:57.7891942Z Retrying single test... 2025-12-04T12:12:57.7892575Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml 2025-12-04T12:12:57.7892731Z ============================= test session starts ============================== 2025-12-04T12:12:57.7893084Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7893188Z cachedir: .pytest_cache 2025-12-04T12:12:57.7893698Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7893829Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7893933Z configfile: pytest.ini 2025-12-04T12:12:57.7894504Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7894740Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7895738Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7895862Z Running 1 items in this shard 2025-12-04T12:12:57.7895867Z 2025-12-04T12:12:57.7896754Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5548s] [100%] 2025-12-04T12:12:57.7897678Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%] 2025-12-04T12:12:57.7898517Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1573s] [100%] 2025-12-04T12:12:57.7898525Z 2025-12-04T12:12:57.7898660Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7899206Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7899354Z Traceback (most recent call last): 2025-12-04T12:12:57.7899831Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7900024Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7900231Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7900236Z 2025-12-04T12:12:57.7900456Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7901547Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7901621Z 2025-12-04T12:12:57.7901896Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7902110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7902220Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7902352Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7902683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7902899Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7903008Z graph_break [] 2025-12-04T12:12:57.7903223Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7903958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7904057Z warnings.warn( 2025-12-04T12:12:57.7904596Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7904726Z Traceback (most recent call last): 2025-12-04T12:12:57.7905186Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7905390Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7905593Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7905601Z 2025-12-04T12:12:57.7905808Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7906739Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7906746Z 2025-12-04T12:12:57.7907003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7907224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7907336Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7907449Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7907791Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7908002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7908146Z graph_break [] 2025-12-04T12:12:57.7908371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7909086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7909195Z warnings.warn( 2025-12-04T12:12:57.7909445Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7909555Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7909681Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7909895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7910223Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7910373Z graph_break [] 2025-12-04T12:12:57.7910583Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7911304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7911405Z warnings.warn( 2025-12-04T12:12:57.7911545Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7912090Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7912240Z Traceback (most recent call last): 2025-12-04T12:12:57.7912699Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7912902Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7913109Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7913115Z 2025-12-04T12:12:57.7913334Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7914245Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7914253Z 2025-12-04T12:12:57.7914509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7914737Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7914845Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7914969Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7915298Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7915513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7915616Z graph_break [] 2025-12-04T12:12:57.7915823Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7916536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7916648Z warnings.warn( 2025-12-04T12:12:57.7916867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7916988Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7917102Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7917318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7917661Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7917757Z graph_break [] 2025-12-04T12:12:57.7917965Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7918695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7918796Z warnings.warn( 2025-12-04T12:12:57.7919056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7919166Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7919278Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7919508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7919886Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7919983Z graph_break [] 2025-12-04T12:12:57.7920205Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7920919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7921032Z warnings.warn( 2025-12-04T12:12:57.7921863Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml - 2025-12-04T12:12:57.7922035Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7923172Z FAILED [0.1573s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7923217Z 2025-12-04T12:12:57.7923429Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7924363Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7924368Z 2025-12-04T12:12:57.7924631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7924810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7925024Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.7925124Z Got exit code 1 2025-12-04T12:12:57.7925245Z Retrying single test... 2025-12-04T12:12:57.7925875Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml 2025-12-04T12:12:57.7926038Z ============================= test session starts ============================== 2025-12-04T12:12:57.7926391Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7926499Z cachedir: .pytest_cache 2025-12-04T12:12:57.7927018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7927139Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7927246Z configfile: pytest.ini 2025-12-04T12:12:57.7927835Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7928059Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7929057Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7929184Z Running 1 items in this shard 2025-12-04T12:12:57.7929190Z 2025-12-04T12:12:57.7930073Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5263s] [100%] 2025-12-04T12:12:57.7930964Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1605s] [100%] 2025-12-04T12:12:57.7931803Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1582s] [100%] 2025-12-04T12:12:57.7931809Z 2025-12-04T12:12:57.7931960Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7932536Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7932658Z Traceback (most recent call last): 2025-12-04T12:12:57.7933129Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7933322Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7933572Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7933578Z 2025-12-04T12:12:57.7933787Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7934698Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7934703Z 2025-12-04T12:12:57.7934976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7935222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7935343Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7935453Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7935783Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7936011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7936108Z graph_break [] 2025-12-04T12:12:57.7936320Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7937056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7937152Z warnings.warn( 2025-12-04T12:12:57.7937702Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7937824Z Traceback (most recent call last): 2025-12-04T12:12:57.7938282Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7938487Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7938692Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7938697Z 2025-12-04T12:12:57.7938906Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7939834Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7939841Z 2025-12-04T12:12:57.7940103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7940327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7940440Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7940554Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7940894Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7941108Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7941215Z graph_break [] 2025-12-04T12:12:57.7941427Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7942142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7942302Z warnings.warn( 2025-12-04T12:12:57.7942513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7942623Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7942744Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7942986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7943332Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7943424Z graph_break [] 2025-12-04T12:12:57.7943633Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7944390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7944490Z warnings.warn( 2025-12-04T12:12:57.7944629Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7945186Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.7945303Z Traceback (most recent call last): 2025-12-04T12:12:57.7945775Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7946022Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7946232Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7946237Z 2025-12-04T12:12:57.7946458Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7947378Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7947383Z 2025-12-04T12:12:57.7947654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7947865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7947973Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7948098Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7948429Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7948658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7948754Z graph_break [] 2025-12-04T12:12:57.7948964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7949696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7949794Z warnings.warn( 2025-12-04T12:12:57.7950005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7950128Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7950238Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7950449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7950789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7950889Z graph_break [] 2025-12-04T12:12:57.7951111Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7951820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7951916Z warnings.warn( 2025-12-04T12:12:57.7952137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7952247Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7952358Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7952581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7952943Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7953052Z graph_break [] 2025-12-04T12:12:57.7953259Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7953996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7954105Z warnings.warn( 2025-12-04T12:12:57.7954903Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml - 2025-12-04T12:12:57.7955113Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7956156Z FAILED [0.1582s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7956164Z 2025-12-04T12:12:57.7956375Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7957304Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7957341Z 2025-12-04T12:12:57.7957601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7957790Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7957984Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.7958081Z Got exit code 1 2025-12-04T12:12:57.7958923Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.7959322Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.7959957Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml 2025-12-04T12:12:57.7960120Z ============================= test session starts ============================== 2025-12-04T12:12:57.7960460Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7960577Z cachedir: .pytest_cache 2025-12-04T12:12:57.7961083Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7961214Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7961319Z configfile: pytest.ini 2025-12-04T12:12:57.7961894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7962195Z collecting ... collected 380 items / 76 deselected / 304 selected 2025-12-04T12:12:57.7962339Z stepcurrent: skipping 76 already run items. 2025-12-04T12:12:57.7962449Z Running 99 items in this shard 2025-12-04T12:12:57.7962460Z 2025-12-04T12:12:57.7963359Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5334s] [ 1%] 2025-12-04T12:12:57.7964234Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1607s] [ 1%] 2025-12-04T12:12:57.7965044Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1567s] [ 1%] 2025-12-04T12:12:57.7965110Z 2025-12-04T12:12:57.7965247Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7965797Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7965949Z Traceback (most recent call last): 2025-12-04T12:12:57.7966410Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7966616Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7966823Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7966828Z 2025-12-04T12:12:57.7967081Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7968007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7968014Z 2025-12-04T12:12:57.7968275Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7968501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7968643Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7968756Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7969102Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7969318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7969431Z graph_break [] 2025-12-04T12:12:57.7969642Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7970365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7970477Z warnings.warn( 2025-12-04T12:12:57.7971012Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7971141Z Traceback (most recent call last): 2025-12-04T12:12:57.7971605Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7971800Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7972017Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7972022Z 2025-12-04T12:12:57.7972226Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7973140Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7973160Z 2025-12-04T12:12:57.7973419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7973630Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7973748Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7973858Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7974193Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7974418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7974516Z graph_break [] 2025-12-04T12:12:57.7974743Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7975459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7975557Z warnings.warn( 2025-12-04T12:12:57.7975775Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7975923Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7976033Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7976260Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7976618Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7976733Z graph_break [] 2025-12-04T12:12:57.7976942Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7977650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7977762Z warnings.warn( 2025-12-04T12:12:57.7977932Z =================================== FAILURES =================================== 2025-12-04T12:12:57.7978478Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7978612Z Traceback (most recent call last): 2025-12-04T12:12:57.7979074Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7979284Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7979522Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7979528Z 2025-12-04T12:12:57.7979737Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7980670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7980675Z 2025-12-04T12:12:57.7980940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7981167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7981281Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7981394Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7981741Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7981954Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7982054Z graph_break [] 2025-12-04T12:12:57.7982278Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7982999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7983116Z warnings.warn( 2025-12-04T12:12:57.7983327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7983436Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7983561Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7983774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7984106Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7984220Z graph_break [] 2025-12-04T12:12:57.7984430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7985158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7985257Z warnings.warn( 2025-12-04T12:12:57.7985465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.7985588Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.7985702Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.7985917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.7986257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.7986391Z graph_break [] 2025-12-04T12:12:57.7986614Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.7987323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.7987453Z warnings.warn( 2025-12-04T12:12:57.7988270Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml - 2025-12-04T12:12:57.7988437Z =========================== short test summary info ============================ 2025-12-04T12:12:57.7989605Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7989614Z 2025-12-04T12:12:57.7989826Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.7990737Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7990775Z 2025-12-04T12:12:57.7991051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.7991224Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.7991431Z ================== 1 failed, 76 deselected, 2 rerun in 4.90s =================== 2025-12-04T12:12:57.7991528Z Got exit code 1 2025-12-04T12:12:57.7991632Z Retrying single test... 2025-12-04T12:12:57.7992274Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml 2025-12-04T12:12:57.7992433Z ============================= test session starts ============================== 2025-12-04T12:12:57.7992776Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.7992896Z cachedir: .pytest_cache 2025-12-04T12:12:57.7993405Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.7993540Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.7993648Z configfile: pytest.ini 2025-12-04T12:12:57.7994224Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.7994460Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.7995459Z stepcurrent: skipping 76 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.7995589Z Running 1 items in this shard 2025-12-04T12:12:57.7995593Z 2025-12-04T12:12:57.7996472Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5653s] [100%] 2025-12-04T12:12:57.7997353Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%] 2025-12-04T12:12:57.7998162Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1579s] [100%] 2025-12-04T12:12:57.7998167Z 2025-12-04T12:12:57.7998311Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.7998862Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.7999018Z Traceback (most recent call last): 2025-12-04T12:12:57.7999493Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.7999712Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.7999922Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.7999926Z 2025-12-04T12:12:57.8000147Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8001234Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8001324Z 2025-12-04T12:12:57.8001601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8001815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8001927Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8002053Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8002446Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8002665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8002840Z graph_break [] 2025-12-04T12:12:57.8003050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8003785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8003883Z warnings.warn( 2025-12-04T12:12:57.8004420Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8004551Z Traceback (most recent call last): 2025-12-04T12:12:57.8005011Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8005202Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8005423Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8005430Z 2025-12-04T12:12:57.8005639Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8006560Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8006565Z 2025-12-04T12:12:57.8006824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8007036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8007165Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8007278Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8007620Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8007833Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8007929Z graph_break [] 2025-12-04T12:12:57.8008154Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8008870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8008968Z warnings.warn( 2025-12-04T12:12:57.8009189Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8009295Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8009420Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8009633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8010008Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8010117Z graph_break [] 2025-12-04T12:12:57.8010327Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8011079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8011192Z warnings.warn( 2025-12-04T12:12:57.8011330Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8011879Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8012001Z Traceback (most recent call last): 2025-12-04T12:12:57.8012491Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8012699Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8012909Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8012914Z 2025-12-04T12:12:57.8013134Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8014051Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8014085Z 2025-12-04T12:12:57.8014344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8014565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8014673Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8014785Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8015126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8015341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8015450Z graph_break [] 2025-12-04T12:12:57.8015658Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8016374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8016488Z warnings.warn( 2025-12-04T12:12:57.8016696Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8016816Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8016927Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8017140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8017485Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8017579Z graph_break [] 2025-12-04T12:12:57.8017785Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8018504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8018600Z warnings.warn( 2025-12-04T12:12:57.8018809Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8018930Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8019040Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8019264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8019589Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8019682Z graph_break [] 2025-12-04T12:12:57.8019904Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8020609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8020745Z warnings.warn( 2025-12-04T12:12:57.8021557Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml - 2025-12-04T12:12:57.8022318Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8023385Z FAILED [0.1579s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8023391Z 2025-12-04T12:12:57.8023599Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8024563Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8024573Z 2025-12-04T12:12:57.8024832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8025006Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8025217Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8025346Z Got exit code 1 2025-12-04T12:12:57.8025463Z Retrying single test... 2025-12-04T12:12:57.8026088Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml 2025-12-04T12:12:57.8026250Z ============================= test session starts ============================== 2025-12-04T12:12:57.8026607Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8026714Z cachedir: .pytest_cache 2025-12-04T12:12:57.8027224Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8027361Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8027466Z configfile: pytest.ini 2025-12-04T12:12:57.8028058Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8028282Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8029277Z stepcurrent: skipping 76 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8029400Z Running 1 items in this shard 2025-12-04T12:12:57.8029407Z 2025-12-04T12:12:57.8030285Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5502s] [100%] 2025-12-04T12:12:57.8031170Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1603s] [100%] 2025-12-04T12:12:57.8031968Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1563s] [100%] 2025-12-04T12:12:57.8031975Z 2025-12-04T12:12:57.8032123Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8032663Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8032783Z Traceback (most recent call last): 2025-12-04T12:12:57.8033256Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8033504Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8033708Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8033725Z 2025-12-04T12:12:57.8033932Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8034879Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8034884Z 2025-12-04T12:12:57.8035156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8035369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8035506Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8035634Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8035968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8036206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8036301Z graph_break [] 2025-12-04T12:12:57.8036511Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8037245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8037377Z warnings.warn( 2025-12-04T12:12:57.8037911Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8038043Z Traceback (most recent call last): 2025-12-04T12:12:57.8038504Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8038712Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8038921Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8038926Z 2025-12-04T12:12:57.8039137Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8040066Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8040073Z 2025-12-04T12:12:57.8040332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8040558Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8040669Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8040784Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8041134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8041351Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8041463Z graph_break [] 2025-12-04T12:12:57.8041672Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8042467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8042588Z warnings.warn( 2025-12-04T12:12:57.8042797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8042903Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8043034Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8043248Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8043576Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8043690Z graph_break [] 2025-12-04T12:12:57.8043903Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8044674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8044774Z warnings.warn( 2025-12-04T12:12:57.8044915Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8045504Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8045627Z Traceback (most recent call last): 2025-12-04T12:12:57.8046100Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8046293Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8046542Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8046548Z 2025-12-04T12:12:57.8046768Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8047683Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8047688Z 2025-12-04T12:12:57.8047958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8048200Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8048310Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8048435Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8048764Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8048975Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8049086Z graph_break [] 2025-12-04T12:12:57.8049294Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8050023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8050124Z warnings.warn( 2025-12-04T12:12:57.8050332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8050451Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8050566Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8050779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8051117Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8051211Z graph_break [] 2025-12-04T12:12:57.8051433Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8052143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8052242Z warnings.warn( 2025-12-04T12:12:57.8052462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8052569Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8052683Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8052906Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8053238Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8053353Z graph_break [] 2025-12-04T12:12:57.8053560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8054270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8054384Z warnings.warn( 2025-12-04T12:12:57.8055180Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml - 2025-12-04T12:12:57.8055387Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8056471Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8056479Z 2025-12-04T12:12:57.8056689Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8057620Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8057626Z 2025-12-04T12:12:57.8057912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8058103Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8058297Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.8058394Z Got exit code 1 2025-12-04T12:12:57.8059239Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8059669Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8060300Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml 2025-12-04T12:12:57.8060458Z ============================= test session starts ============================== 2025-12-04T12:12:57.8060797Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8060915Z cachedir: .pytest_cache 2025-12-04T12:12:57.8061421Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8061543Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8061660Z configfile: pytest.ini 2025-12-04T12:12:57.8062237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8062470Z collecting ... collected 380 items / 77 deselected / 303 selected 2025-12-04T12:12:57.8062610Z stepcurrent: skipping 77 already run items. 2025-12-04T12:12:57.8062720Z Running 98 items in this shard 2025-12-04T12:12:57.8062726Z 2025-12-04T12:12:57.8063731Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.8064717Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.8065610Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5611s] [ 3%] 2025-12-04T12:12:57.8066489Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1595s] [ 3%] 2025-12-04T12:12:57.8067303Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1587s] [ 3%] 2025-12-04T12:12:57.8067308Z 2025-12-04T12:12:57.8067446Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8068018Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8068150Z Traceback (most recent call last): 2025-12-04T12:12:57.8068639Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8068851Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8069058Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8069063Z 2025-12-04T12:12:57.8069270Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8070242Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8070248Z 2025-12-04T12:12:57.8070507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8070736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8070844Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8070955Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8071299Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8071545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8071640Z graph_break [] 2025-12-04T12:12:57.8071865Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8072584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8072696Z warnings.warn( 2025-12-04T12:12:57.8073233Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8073353Z Traceback (most recent call last): 2025-12-04T12:12:57.8073826Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8074015Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8074224Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8074242Z 2025-12-04T12:12:57.8074451Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8075362Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8075368Z 2025-12-04T12:12:57.8075640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8075852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8075976Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8076087Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8076420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8076644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8076745Z graph_break [] 2025-12-04T12:12:57.8076953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8077677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8077776Z warnings.warn( 2025-12-04T12:12:57.8077984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8078105Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8078217Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8078478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8078804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8078899Z graph_break [] 2025-12-04T12:12:57.8079120Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8079864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8079963Z warnings.warn( 2025-12-04T12:12:57.8080113Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8080681Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8080814Z Traceback (most recent call last): 2025-12-04T12:12:57.8081274Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8081469Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8081685Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8081690Z 2025-12-04T12:12:57.8081895Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8082923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8082930Z 2025-12-04T12:12:57.8083189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8083397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8083521Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8083632Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8083973Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8084189Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8084284Z graph_break [] 2025-12-04T12:12:57.8084505Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8085218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8085317Z warnings.warn( 2025-12-04T12:12:57.8085537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8085645Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8085772Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8085988Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8086314Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8086432Z graph_break [] 2025-12-04T12:12:57.8086641Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8087351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8087468Z warnings.warn( 2025-12-04T12:12:57.8087675Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8087795Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8087906Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8088121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8088459Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8088556Z graph_break [] 2025-12-04T12:12:57.8088765Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8089532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8089629Z warnings.warn( 2025-12-04T12:12:57.8090476Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml - 2025-12-04T12:12:57.8090646Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8091687Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8091693Z 2025-12-04T12:12:57.8091948Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8092864Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8092872Z 2025-12-04T12:12:57.8093140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8093316Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8093558Z ============= 1 failed, 2 skipped, 77 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:57.8093669Z Got exit code 1 2025-12-04T12:12:57.8093776Z Retrying single test... 2025-12-04T12:12:57.8094414Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml 2025-12-04T12:12:57.8094576Z ============================= test session starts ============================== 2025-12-04T12:12:57.8094913Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8095039Z cachedir: .pytest_cache 2025-12-04T12:12:57.8095547Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8095666Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8095785Z configfile: pytest.ini 2025-12-04T12:12:57.8096365Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8096599Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8097597Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8097710Z Running 1 items in this shard 2025-12-04T12:12:57.8097714Z 2025-12-04T12:12:57.8098608Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5419s] [100%] 2025-12-04T12:12:57.8099485Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [100%] 2025-12-04T12:12:57.8100308Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [100%] 2025-12-04T12:12:57.8100313Z 2025-12-04T12:12:57.8100457Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8101223Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8101347Z Traceback (most recent call last): 2025-12-04T12:12:57.8101889Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8102102Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8102313Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8102318Z 2025-12-04T12:12:57.8102583Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8103506Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8103511Z 2025-12-04T12:12:57.8103773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8104044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8104160Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8104275Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8104623Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8104844Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8104959Z graph_break [] 2025-12-04T12:12:57.8105173Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8105938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8106055Z warnings.warn( 2025-12-04T12:12:57.8106595Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8106731Z Traceback (most recent call last): 2025-12-04T12:12:57.8107194Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8107389Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8107608Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8107613Z 2025-12-04T12:12:57.8107819Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8108735Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8108755Z 2025-12-04T12:12:57.8109013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8109223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8109346Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8109460Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8109789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8110015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8110109Z graph_break [] 2025-12-04T12:12:57.8110329Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8111046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8111145Z warnings.warn( 2025-12-04T12:12:57.8111364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8111470Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8111580Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8111805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8112132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8112239Z graph_break [] 2025-12-04T12:12:57.8112480Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8113185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8113294Z warnings.warn( 2025-12-04T12:12:57.8113462Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8114007Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8114136Z Traceback (most recent call last): 2025-12-04T12:12:57.8114595Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8114830Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8115036Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8115041Z 2025-12-04T12:12:57.8115249Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8116177Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8116182Z 2025-12-04T12:12:57.8116474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8116697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8116806Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8116939Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8117402Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8117621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8117716Z graph_break [] 2025-12-04T12:12:57.8117939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8118652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8118764Z warnings.warn( 2025-12-04T12:12:57.8118975Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8119085Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8119209Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8119425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8119755Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8119864Z graph_break [] 2025-12-04T12:12:57.8120073Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8120794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8120893Z warnings.warn( 2025-12-04T12:12:57.8121100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8121218Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8121330Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8121548Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8121888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8121982Z graph_break [] 2025-12-04T12:12:57.8122263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8122977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8123077Z warnings.warn( 2025-12-04T12:12:57.8123889Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml - 2025-12-04T12:12:57.8124124Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8125231Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8125240Z 2025-12-04T12:12:57.8125452Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8126400Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8126420Z 2025-12-04T12:12:57.8126681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8126859Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8127067Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:57.8127166Z Got exit code 1 2025-12-04T12:12:57.8127271Z Retrying single test... 2025-12-04T12:12:57.8127954Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml 2025-12-04T12:12:57.8128112Z ============================= test session starts ============================== 2025-12-04T12:12:57.8128464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8128569Z cachedir: .pytest_cache 2025-12-04T12:12:57.8129078Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8129210Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8129318Z configfile: pytest.ini 2025-12-04T12:12:57.8129891Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8130125Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8131126Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8131253Z Running 1 items in this shard 2025-12-04T12:12:57.8131258Z 2025-12-04T12:12:57.8132142Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5722s] [100%] 2025-12-04T12:12:57.8133010Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%] 2025-12-04T12:12:57.8133824Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1586s] [100%] 2025-12-04T12:12:57.8133833Z 2025-12-04T12:12:57.8133970Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8134520Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8134640Z Traceback (most recent call last): 2025-12-04T12:12:57.8135114Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8135309Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8135515Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8135581Z 2025-12-04T12:12:57.8135804Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8136747Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8136755Z 2025-12-04T12:12:57.8137029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8137244Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8137353Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8137478Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8137841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8138055Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8138167Z graph_break [] 2025-12-04T12:12:57.8138378Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8139111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8139208Z warnings.warn( 2025-12-04T12:12:57.8139782Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8139915Z Traceback (most recent call last): 2025-12-04T12:12:57.8140374Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8140563Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8140784Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8140789Z 2025-12-04T12:12:57.8140995Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8141924Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8141929Z 2025-12-04T12:12:57.8142191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8142404Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8142523Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8142634Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8142975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8143187Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8143285Z graph_break [] 2025-12-04T12:12:57.8143506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8144221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8144317Z warnings.warn( 2025-12-04T12:12:57.8144537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8144647Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8144774Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8144984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8145310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8145418Z graph_break [] 2025-12-04T12:12:57.8145629Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8146345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8146498Z warnings.warn( 2025-12-04T12:12:57.8146640Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8147190Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8147314Z Traceback (most recent call last): 2025-12-04T12:12:57.8147806Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8148011Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8148219Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8148225Z 2025-12-04T12:12:57.8148446Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8149394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8149402Z 2025-12-04T12:12:57.8149662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8149883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8149991Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8150145Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8150477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8150690Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8150798Z graph_break [] 2025-12-04T12:12:57.8151007Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8151720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8151832Z warnings.warn( 2025-12-04T12:12:57.8152044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8152161Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8152274Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8152487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8152849Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8152945Z graph_break [] 2025-12-04T12:12:57.8153153Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8153878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8153979Z warnings.warn( 2025-12-04T12:12:57.8154205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8154314Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8154427Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8154656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8154985Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8155080Z graph_break [] 2025-12-04T12:12:57.8155307Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8156015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8156127Z warnings.warn( 2025-12-04T12:12:57.8156934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml - 2025-12-04T12:12:57.8157101Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8158195Z FAILED [0.1586s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8158201Z 2025-12-04T12:12:57.8158448Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8159382Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8159387Z 2025-12-04T12:12:57.8159647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8159854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8160065Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8160164Z Got exit code 1 2025-12-04T12:12:57.8161012Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8161412Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8162075Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml 2025-12-04T12:12:57.8162319Z ============================= test session starts ============================== 2025-12-04T12:12:57.8162664Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8162787Z cachedir: .pytest_cache 2025-12-04T12:12:57.8163299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8163419Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8163545Z configfile: pytest.ini 2025-12-04T12:12:57.8164118Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8164339Z collecting ... collected 380 items / 80 deselected / 300 selected 2025-12-04T12:12:57.8164497Z stepcurrent: skipping 80 already run items. 2025-12-04T12:12:57.8164610Z Running 95 items in this shard 2025-12-04T12:12:57.8164616Z 2025-12-04T12:12:57.8165519Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5436s] [ 1%] 2025-12-04T12:12:57.8166394Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [ 1%] 2025-12-04T12:12:57.8167197Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1558s] [ 1%] 2025-12-04T12:12:57.8167216Z 2025-12-04T12:12:57.8167356Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8167903Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8168042Z Traceback (most recent call last): 2025-12-04T12:12:57.8168503Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8168699Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8168925Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8168929Z 2025-12-04T12:12:57.8169139Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8170110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8170115Z 2025-12-04T12:12:57.8170374Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8170696Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8170807Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8170916Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8171262Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8171473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8171598Z graph_break [] 2025-12-04T12:12:57.8171823Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8172539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8172640Z warnings.warn( 2025-12-04T12:12:57.8173189Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8173360Z Traceback (most recent call last): 2025-12-04T12:12:57.8173838Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8174027Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8174232Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8174237Z 2025-12-04T12:12:57.8174460Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8175379Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8175386Z 2025-12-04T12:12:57.8175656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8175866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8175979Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8176102Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8176432Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8176657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8176753Z graph_break [] 2025-12-04T12:12:57.8176964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8177694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8177792Z warnings.warn( 2025-12-04T12:12:57.8178003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8178122Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8178232Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8178460Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8178789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8178883Z graph_break [] 2025-12-04T12:12:57.8179101Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8179808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8179907Z warnings.warn( 2025-12-04T12:12:57.8180058Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8180627Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8180757Z Traceback (most recent call last): 2025-12-04T12:12:57.8181216Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8181442Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8181665Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8181670Z 2025-12-04T12:12:57.8181878Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8182832Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8182838Z 2025-12-04T12:12:57.8183096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8183309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8183431Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8183542Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8183870Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8184129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8184224Z graph_break [] 2025-12-04T12:12:57.8184445Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8185160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8185260Z warnings.warn( 2025-12-04T12:12:57.8185481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8185590Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8185702Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8185928Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8186257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8186365Z graph_break [] 2025-12-04T12:12:57.8186576Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8187282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8187392Z warnings.warn( 2025-12-04T12:12:57.8187599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8187709Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8187831Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8188043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8188385Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8188479Z graph_break [] 2025-12-04T12:12:57.8188685Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8189407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8189506Z warnings.warn( 2025-12-04T12:12:57.8190304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml - 2025-12-04T12:12:57.8190483Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8191524Z FAILED [0.1558s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8191569Z 2025-12-04T12:12:57.8191792Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8192736Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8192744Z 2025-12-04T12:12:57.8193018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8193193Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8193383Z ================== 1 failed, 80 deselected, 2 rerun in 4.91s =================== 2025-12-04T12:12:57.8193522Z Got exit code 1 2025-12-04T12:12:57.8193628Z Retrying single test... 2025-12-04T12:12:57.8194252Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml 2025-12-04T12:12:57.8194423Z ============================= test session starts ============================== 2025-12-04T12:12:57.8194765Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8194887Z cachedir: .pytest_cache 2025-12-04T12:12:57.8195427Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8195545Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8195662Z configfile: pytest.ini 2025-12-04T12:12:57.8196238Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8196475Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8197466Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8197580Z Running 1 items in this shard 2025-12-04T12:12:57.8197585Z 2025-12-04T12:12:57.8198481Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5605s] [100%] 2025-12-04T12:12:57.8199357Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1647s] [100%] 2025-12-04T12:12:57.8200169Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1576s] [100%] 2025-12-04T12:12:57.8200174Z 2025-12-04T12:12:57.8200310Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8201049Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8201171Z Traceback (most recent call last): 2025-12-04T12:12:57.8201635Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8201844Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8202050Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8202054Z 2025-12-04T12:12:57.8202358Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8203292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8203298Z 2025-12-04T12:12:57.8203649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8203875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8203987Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8204101Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8204489Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8204711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8204826Z graph_break [] 2025-12-04T12:12:57.8205036Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8205796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8205908Z warnings.warn( 2025-12-04T12:12:57.8206443Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8206566Z Traceback (most recent call last): 2025-12-04T12:12:57.8207042Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8207233Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8207494Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8207499Z 2025-12-04T12:12:57.8207705Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8208620Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8208625Z 2025-12-04T12:12:57.8208898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8209109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8209232Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8209344Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8209677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8215216Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8215357Z graph_break [] 2025-12-04T12:12:57.8215587Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8216332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8216436Z warnings.warn( 2025-12-04T12:12:57.8216658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8216788Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8216904Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8217144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8217478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8217580Z graph_break [] 2025-12-04T12:12:57.8217811Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8218526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8218626Z warnings.warn( 2025-12-04T12:12:57.8218783Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8219328Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8219465Z Traceback (most recent call last): 2025-12-04T12:12:57.8219928Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8220213Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8220438Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8220445Z 2025-12-04T12:12:57.8220697Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8221637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8221643Z 2025-12-04T12:12:57.8221909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8222177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8222303Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8222417Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8222754Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8222966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8223067Z graph_break [] 2025-12-04T12:12:57.8223290Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8224041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8224140Z warnings.warn( 2025-12-04T12:12:57.8224360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8224468Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8224579Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8224805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8225133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8225244Z graph_break [] 2025-12-04T12:12:57.8225452Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8226162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8226276Z warnings.warn( 2025-12-04T12:12:57.8226483Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8226590Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8226712Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8226929Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8227271Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8227366Z graph_break [] 2025-12-04T12:12:57.8227574Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8228302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8228398Z warnings.warn( 2025-12-04T12:12:57.8229200Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml - 2025-12-04T12:12:57.8229383Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8230437Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8230446Z 2025-12-04T12:12:57.8230669Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8231583Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8231623Z 2025-12-04T12:12:57.8231899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8232101Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8232301Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8232408Z Got exit code 1 2025-12-04T12:12:57.8232510Z Retrying single test... 2025-12-04T12:12:57.8233139Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml 2025-12-04T12:12:57.8233343Z ============================= test session starts ============================== 2025-12-04T12:12:57.8233691Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8233812Z cachedir: .pytest_cache 2025-12-04T12:12:57.8234318Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8234438Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8234555Z configfile: pytest.ini 2025-12-04T12:12:57.8235161Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8235385Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8236393Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8236505Z Running 1 items in this shard 2025-12-04T12:12:57.8236510Z 2025-12-04T12:12:57.8237412Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5337s] [100%] 2025-12-04T12:12:57.8238294Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%] 2025-12-04T12:12:57.8239110Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [100%] 2025-12-04T12:12:57.8239116Z 2025-12-04T12:12:57.8239255Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8239794Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8239927Z Traceback (most recent call last): 2025-12-04T12:12:57.8240390Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8240598Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8240803Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8240809Z 2025-12-04T12:12:57.8241021Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8241942Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8241947Z 2025-12-04T12:12:57.8242296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8242533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8242649Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8242807Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8243160Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8243374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8243473Z graph_break [] 2025-12-04T12:12:57.8243733Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8244459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8244570Z warnings.warn( 2025-12-04T12:12:57.8245111Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8245260Z Traceback (most recent call last): 2025-12-04T12:12:57.8245735Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8245931Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8246150Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8246155Z 2025-12-04T12:12:57.8246360Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8247277Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8247312Z 2025-12-04T12:12:57.8247581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8247794Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8247917Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8248031Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8248362Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8248589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8248683Z graph_break [] 2025-12-04T12:12:57.8248891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8249622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8249724Z warnings.warn( 2025-12-04T12:12:57.8249953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8250060Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8250170Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8250395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8250723Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8250817Z graph_break [] 2025-12-04T12:12:57.8251039Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8251744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8251853Z warnings.warn( 2025-12-04T12:12:57.8251998Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8252535Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8252664Z Traceback (most recent call last): 2025-12-04T12:12:57.8253123Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8253317Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8253534Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8253571Z 2025-12-04T12:12:57.8253781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8254703Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8254740Z 2025-12-04T12:12:57.8255001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8255210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8255330Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8255440Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8255783Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8256029Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8256124Z graph_break [] 2025-12-04T12:12:57.8256349Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8257070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8257169Z warnings.warn( 2025-12-04T12:12:57.8257392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8257534Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8257660Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8257871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8258199Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8258306Z graph_break [] 2025-12-04T12:12:57.8258516Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8259227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8259342Z warnings.warn( 2025-12-04T12:12:57.8259551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8259670Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8259783Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8260002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8260340Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8260434Z graph_break [] 2025-12-04T12:12:57.8260641Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8261360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8261458Z warnings.warn( 2025-12-04T12:12:57.8262269Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml - 2025-12-04T12:12:57.8262437Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8263479Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8263500Z 2025-12-04T12:12:57.8263710Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8264626Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8264631Z 2025-12-04T12:12:57.8264904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8265116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8265320Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.8265417Z Got exit code 1 2025-12-04T12:12:57.8266293Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8266710Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8267338Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml 2025-12-04T12:12:57.8267527Z ============================= test session starts ============================== 2025-12-04T12:12:57.8267882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8267992Z cachedir: .pytest_cache 2025-12-04T12:12:57.8268513Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8268637Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8268778Z configfile: pytest.ini 2025-12-04T12:12:57.8269368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8269590Z collecting ... collected 380 items / 81 deselected / 299 selected 2025-12-04T12:12:57.8269743Z stepcurrent: skipping 81 already run items. 2025-12-04T12:12:57.8269853Z Running 94 items in this shard 2025-12-04T12:12:57.8269859Z 2025-12-04T12:12:57.8270860Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.8271859Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.8272842Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 3%] 2025-12-04T12:12:57.8273836Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 4%] 2025-12-04T12:12:57.8274821Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 5%] 2025-12-04T12:12:57.8275822Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 6%] 2025-12-04T12:12:57.8276804Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 7%] 2025-12-04T12:12:57.8277798Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [ 8%] 2025-12-04T12:12:57.8278343Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims0 PASSED [6.4807s] [ 9%] 2025-12-04T12:12:57.8278921Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims2 PASSED [1.5991s] [ 10%] 2025-12-04T12:12:57.8279566Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_3layer_split_reduction SKIPPED [0.0034s] (Mix order reduction not enabled) [ 11%] 2025-12-04T12:12:57.8280219Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_XBLOCK_coordest_tuning SKIPPED [0.0028s] (Mix order reduction not enabled) [ 12%] 2025-12-04T12:12:57.8280822Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape0 PASSED [1.1809s] [ 13%] 2025-12-04T12:12:57.8281438Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape1 PASSED [1.1370s] [ 14%] 2025-12-04T12:12:57.8282244Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1744s] [ 15%] 2025-12-04T12:12:57.8283001Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1396s] [ 15%] 2025-12-04T12:12:57.8283648Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1378s] [ 15%] 2025-12-04T12:12:57.8283694Z 2025-12-04T12:12:57.8283849Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8284232Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8284351Z Traceback (most recent call last): 2025-12-04T12:12:57.8284893Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8285088Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8285314Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8285319Z 2025-12-04T12:12:57.8285531Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8286297Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8286319Z 2025-12-04T12:12:57.8286577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8286793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8286918Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8287033Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8287251Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8287384Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8287477Z graph_break [] 2025-12-04T12:12:57.8287689Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8288422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8288521Z warnings.warn( 2025-12-04T12:12:57.8288917Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8289036Z Traceback (most recent call last): 2025-12-04T12:12:57.8289553Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8289759Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8289964Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8289969Z 2025-12-04T12:12:57.8290191Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8290986Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8290991Z 2025-12-04T12:12:57.8291251Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8291506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8291619Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8291734Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8291963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8292080Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8292190Z graph_break [] 2025-12-04T12:12:57.8292432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8293148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8293262Z warnings.warn( 2025-12-04T12:12:57.8293470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8293577Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8293701Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8293916Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8294078Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8294172Z graph_break [] 2025-12-04T12:12:57.8294379Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8295099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8295197Z warnings.warn( 2025-12-04T12:12:57.8295336Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8295734Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8295851Z Traceback (most recent call last): 2025-12-04T12:12:57.8296385Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8296580Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8296786Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8296791Z 2025-12-04T12:12:57.8297008Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8297772Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8297779Z 2025-12-04T12:12:57.8298050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8298263Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8298374Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8298497Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8298710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8298827Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8298939Z graph_break [] 2025-12-04T12:12:57.8299148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8299868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8299966Z warnings.warn( 2025-12-04T12:12:57.8300175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8300299Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8300411Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8300626Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8300801Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8301187Z graph_break [] 2025-12-04T12:12:57.8301402Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8302223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8302329Z warnings.warn( 2025-12-04T12:12:57.8302556Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8302664Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8302777Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8303009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8303174Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8303270Z graph_break [] 2025-12-04T12:12:57.8303490Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8304202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8304312Z warnings.warn( 2025-12-04T12:12:57.8305113Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml - 2025-12-04T12:12:57.8305321Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8306240Z FAILED [0.1378s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8306247Z 2025-12-04T12:12:57.8306462Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8307238Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8307246Z 2025-12-04T12:12:57.8307505Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8307681Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8307929Z ======= 1 failed, 4 passed, 10 skipped, 81 deselected, 2 rerun in 10.95s ======= 2025-12-04T12:12:57.8308025Z Got exit code 1 2025-12-04T12:12:57.8308145Z Retrying single test... 2025-12-04T12:12:57.8308774Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml 2025-12-04T12:12:57.8308934Z ============================= test session starts ============================== 2025-12-04T12:12:57.8309288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8309395Z cachedir: .pytest_cache 2025-12-04T12:12:57.8309900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8310033Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8310140Z configfile: pytest.ini 2025-12-04T12:12:57.8310729Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8310954Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8311795Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8311920Z Running 1 items in this shard 2025-12-04T12:12:57.8311925Z 2025-12-04T12:12:57.8312656Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [4.5336s] [100%] 2025-12-04T12:12:57.8313467Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1406s] [100%] 2025-12-04T12:12:57.8314143Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1374s] [100%] 2025-12-04T12:12:57.8314150Z 2025-12-04T12:12:57.8314304Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8314687Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8314807Z Traceback (most recent call last): 2025-12-04T12:12:57.8315380Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8315575Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8315788Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8315793Z 2025-12-04T12:12:57.8316013Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8316778Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8316813Z 2025-12-04T12:12:57.8317088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8317303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8317411Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8317535Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8317655Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8317881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8317977Z graph_break [] 2025-12-04T12:12:57.8318188Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8318915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8319013Z warnings.warn( 2025-12-04T12:12:57.8319399Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8319533Z Traceback (most recent call last): 2025-12-04T12:12:57.8320055Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8320263Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8320471Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8320476Z 2025-12-04T12:12:57.8320685Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8321458Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8321463Z 2025-12-04T12:12:57.8321722Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8321948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8322059Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8322238Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8322374Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8322590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8322685Z graph_break [] 2025-12-04T12:12:57.8322909Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8323618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8323774Z warnings.warn( 2025-12-04T12:12:57.8323984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8324089Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8324212Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8324456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8324576Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8324681Z graph_break [] 2025-12-04T12:12:57.8324888Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8325597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8325736Z warnings.warn( 2025-12-04T12:12:57.8325878Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8326276Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8326398Z Traceback (most recent call last): 2025-12-04T12:12:57.8326920Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8327129Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8327368Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8327373Z 2025-12-04T12:12:57.8327594Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8328361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8328366Z 2025-12-04T12:12:57.8328628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8328853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8328966Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8329077Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8329209Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8329427Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8329539Z graph_break [] 2025-12-04T12:12:57.8329749Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8330461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8330572Z warnings.warn( 2025-12-04T12:12:57.8330780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8330892Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8331017Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8331234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8331365Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8331459Z graph_break [] 2025-12-04T12:12:57.8331667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8332384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8332492Z warnings.warn( 2025-12-04T12:12:57.8332708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8332814Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8332924Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8333151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8333271Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8333363Z graph_break [] 2025-12-04T12:12:57.8333586Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8334333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8334445Z warnings.warn( 2025-12-04T12:12:57.8335280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml - 2025-12-04T12:12:57.8335449Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8336364Z FAILED [0.1374s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8336370Z 2025-12-04T12:12:57.8336611Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8337381Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8337388Z 2025-12-04T12:12:57.8337649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8337824Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8338070Z ================== 1 failed, 174 deselected, 2 rerun in 4.86s ================== 2025-12-04T12:12:57.8338167Z Got exit code 1 2025-12-04T12:12:57.8338272Z Retrying single test... 2025-12-04T12:12:57.8338911Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml 2025-12-04T12:12:57.8339071Z ============================= test session starts ============================== 2025-12-04T12:12:57.8339428Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8339535Z cachedir: .pytest_cache 2025-12-04T12:12:57.8340043Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8340178Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8340286Z configfile: pytest.ini 2025-12-04T12:12:57.8340876Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8341103Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8341942Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8342068Z Running 1 items in this shard 2025-12-04T12:12:57.8342073Z 2025-12-04T12:12:57.8342810Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [4.5108s] [100%] 2025-12-04T12:12:57.8343552Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1396s] [100%] 2025-12-04T12:12:57.8344200Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1364s] [100%] 2025-12-04T12:12:57.8344208Z 2025-12-04T12:12:57.8344346Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8344738Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8344855Z Traceback (most recent call last): 2025-12-04T12:12:57.8345383Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8345576Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8345816Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8345821Z 2025-12-04T12:12:57.8346040Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8346835Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8346844Z 2025-12-04T12:12:57.8347115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8347330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8347438Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8347557Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8347703Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8347921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8348027Z graph_break [] 2025-12-04T12:12:57.8348238Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8348964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8349062Z warnings.warn( 2025-12-04T12:12:57.8349490Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8349620Z Traceback (most recent call last): 2025-12-04T12:12:57.8350135Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8350328Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8350548Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8350554Z 2025-12-04T12:12:57.8350762Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8351540Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8351545Z 2025-12-04T12:12:57.8351800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8352016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8352138Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8352249Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8352382Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8352600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8352695Z graph_break [] 2025-12-04T12:12:57.8352916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8353623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8353725Z warnings.warn( 2025-12-04T12:12:57.8353945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8354052Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8354176Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8354391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8354510Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8354615Z graph_break [] 2025-12-04T12:12:57.8354822Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8355531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8355639Z warnings.warn( 2025-12-04T12:12:57.8355776Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8356170Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _ 2025-12-04T12:12:57.8356399Z Traceback (most recent call last): 2025-12-04T12:12:57.8356916Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias 2025-12-04T12:12:57.8357149Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8357358Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8357363Z 2025-12-04T12:12:57.8357570Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8358347Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8358351Z 2025-12-04T12:12:57.8358638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8358861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8358972Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8359082Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8359210Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8359424Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8359561Z graph_break [] 2025-12-04T12:12:57.8359771Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8360483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8360593Z warnings.warn( 2025-12-04T12:12:57.8360798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8360908Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8361031Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8361246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8361365Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8361471Z graph_break [] 2025-12-04T12:12:57.8361677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8362479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8362580Z warnings.warn( 2025-12-04T12:12:57.8362787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8362910Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8363021Z stats [('calls_captured', 3)] 2025-12-04T12:12:57.8363236Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8363369Z inductor [('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8363463Z graph_break [] 2025-12-04T12:12:57.8363685Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8364393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8364491Z warnings.warn( 2025-12-04T12:12:57.8365304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml - 2025-12-04T12:12:57.8365471Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8366384Z FAILED [0.1364s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8366391Z 2025-12-04T12:12:57.8366601Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8367359Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8367404Z 2025-12-04T12:12:57.8367674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8367847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8368084Z ================== 1 failed, 174 deselected, 2 rerun in 4.84s ================== 2025-12-04T12:12:57.8368180Z Got exit code 1 2025-12-04T12:12:57.8368862Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 2025-12-04T12:12:57.8369275Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8369928Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml 2025-12-04T12:12:57.8370103Z ============================= test session starts ============================== 2025-12-04T12:12:57.8370445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8370550Z cachedir: .pytest_cache 2025-12-04T12:12:57.8371073Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8371224Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8371331Z configfile: pytest.ini 2025-12-04T12:12:57.8371917Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8372138Z collecting ... collected 380 items / 96 deselected / 284 selected 2025-12-04T12:12:57.8372294Z stepcurrent: skipping 96 already run items. 2025-12-04T12:12:57.8372404Z Running 79 items in this shard 2025-12-04T12:12:57.8372409Z 2025-12-04T12:12:57.8373056Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0 PASSED [5.4991s] [ 1%] 2025-12-04T12:12:57.8373705Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0 PASSED [1.0878s] [ 2%] 2025-12-04T12:12:57.8374436Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims0 SKIPPED [0.0032s] (Mix order reduction not enabled) [ 3%] 2025-12-04T12:12:57.8375175Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2 SKIPPED [0.0028s] (Mix order reduction not enabled) [ 5%] 2025-12-04T12:12:57.8375852Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0 PASSED [0.3674s] [ 6%] 2025-12-04T12:12:57.8376521Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0 PASSED [0.4677s] [ 7%] 2025-12-04T12:12:57.8377201Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape1 PASSED [0.4821s] [ 8%] 2025-12-04T12:12:57.8377884Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape1 PASSED [0.5381s] [ 10%] 2025-12-04T12:12:57.8378568Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape1 PASSED [0.5491s] [ 11%] 2025-12-04T12:12:57.8379236Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape0 PASSED [0.5275s] [ 12%] 2025-12-04T12:12:57.8380034Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [ 13%] 2025-12-04T12:12:57.8380745Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_True_shape0 PASSED [0.5533s] [ 15%] 2025-12-04T12:12:57.8381441Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0 PASSED [0.5751s] [ 16%] 2025-12-04T12:12:57.8382123Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2 PASSED [0.7191s] [ 17%] 2025-12-04T12:12:57.8382795Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape1 PASSED [0.5642s] [ 18%] 2025-12-04T12:12:57.8383495Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape2 PASSED [0.2829s] [ 20%] 2025-12-04T12:12:57.8384155Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0 PASSED [0.2914s] [ 21%] 2025-12-04T12:12:57.8384824Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape1 PASSED [0.2882s] [ 22%] 2025-12-04T12:12:57.8385335Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_multi_workspace_allocation PASSED [0.6593s] [ 24%] 2025-12-04T12:12:57.8385792Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_non_contiguous_input PASSED [0.8390s] [ 25%] 2025-12-04T12:12:57.8386717Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.2270s] [ 26%] 2025-12-04T12:12:57.8387617Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [ 26%] 2025-12-04T12:12:57.8388459Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1569s] [ 26%] 2025-12-04T12:12:57.8388469Z 2025-12-04T12:12:57.8388609Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8389174Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8389308Z Traceback (most recent call last): 2025-12-04T12:12:57.8389773Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8389978Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8390185Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8390192Z 2025-12-04T12:12:57.8390400Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8391360Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8391368Z 2025-12-04T12:12:57.8391628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8391854Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8391963Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8392074Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8392304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8392636Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8392741Z graph_break [] 2025-12-04T12:12:57.8392986Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8393704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8393813Z warnings.warn( 2025-12-04T12:12:57.8394407Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8394527Z Traceback (most recent call last): 2025-12-04T12:12:57.8395001Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8395194Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8395452Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8395458Z 2025-12-04T12:12:57.8395666Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8396614Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8396619Z 2025-12-04T12:12:57.8396892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8397135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8397256Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8397369Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8397587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8397932Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8398029Z graph_break [] 2025-12-04T12:12:57.8398239Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8398966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8399065Z warnings.warn( 2025-12-04T12:12:57.8399287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8399397Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8399509Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8399738Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8400066Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8400162Z graph_break [] 2025-12-04T12:12:57.8400386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8401449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8401570Z warnings.warn( 2025-12-04T12:12:57.8401711Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8402338Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8402481Z Traceback (most recent call last): 2025-12-04T12:12:57.8402944Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8403137Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8403355Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8403361Z 2025-12-04T12:12:57.8403573Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8404534Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8404640Z 2025-12-04T12:12:57.8404903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8405129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8405239Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8405394Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8405624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8405951Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8406049Z graph_break [] 2025-12-04T12:12:57.8406272Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8407029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8407145Z warnings.warn( 2025-12-04T12:12:57.8407352Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8407459Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8407586Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8407800Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8408173Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8408278Z graph_break [] 2025-12-04T12:12:57.8408487Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8409191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8409302Z warnings.warn( 2025-12-04T12:12:57.8409515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8409635Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8409744Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8409957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8410294Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8410389Z graph_break [] 2025-12-04T12:12:57.8410601Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8411317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8411412Z warnings.warn( 2025-12-04T12:12:57.8412227Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml - 2025-12-04T12:12:57.8412394Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8413473Z FAILED [0.1569s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8413490Z 2025-12-04T12:12:57.8413701Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8414648Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8414653Z 2025-12-04T12:12:57.8414925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8415102Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8415347Z ======= 1 failed, 17 passed, 3 skipped, 96 deselected, 2 rerun in 14.93s ======= 2025-12-04T12:12:57.8415489Z Got exit code 1 2025-12-04T12:12:57.8415594Z Retrying single test... 2025-12-04T12:12:57.8416225Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml 2025-12-04T12:12:57.8416384Z ============================= test session starts ============================== 2025-12-04T12:12:57.8416756Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8416875Z cachedir: .pytest_cache 2025-12-04T12:12:57.8417385Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8417520Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8417625Z configfile: pytest.ini 2025-12-04T12:12:57.8418231Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8418464Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8419491Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8419646Z Running 1 items in this shard 2025-12-04T12:12:57.8419651Z 2025-12-04T12:12:57.8420554Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5621s] [100%] 2025-12-04T12:12:57.8421454Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1646s] [100%] 2025-12-04T12:12:57.8422285Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1585s] [100%] 2025-12-04T12:12:57.8422293Z 2025-12-04T12:12:57.8422430Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8423005Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8423126Z Traceback (most recent call last): 2025-12-04T12:12:57.8423587Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8423790Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8423994Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8423999Z 2025-12-04T12:12:57.8424221Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8425162Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8425171Z 2025-12-04T12:12:57.8425431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8425657Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8425768Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8425892Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8426223Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8426439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8426548Z graph_break [] 2025-12-04T12:12:57.8426759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8427478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8427629Z warnings.warn( 2025-12-04T12:12:57.8428192Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8428322Z Traceback (most recent call last): 2025-12-04T12:12:57.8428816Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8429010Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8429224Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8429229Z 2025-12-04T12:12:57.8429439Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8430426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8430434Z 2025-12-04T12:12:57.8430694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8430904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8431024Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8431171Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8431520Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8431736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8431832Z graph_break [] 2025-12-04T12:12:57.8432051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8432770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8432869Z warnings.warn( 2025-12-04T12:12:57.8433090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8433198Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8433321Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8433532Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8433862Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8433968Z graph_break [] 2025-12-04T12:12:57.8434174Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8434883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8434992Z warnings.warn( 2025-12-04T12:12:57.8435136Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8435712Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8435833Z Traceback (most recent call last): 2025-12-04T12:12:57.8436291Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8436501Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8436708Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8436713Z 2025-12-04T12:12:57.8436920Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8437875Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8437881Z 2025-12-04T12:12:57.8438142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8438398Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8438509Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8438620Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8438965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8439231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8439342Z graph_break [] 2025-12-04T12:12:57.8439549Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8440266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8440376Z warnings.warn( 2025-12-04T12:12:57.8440613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8440726Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8440852Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8441065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8441404Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8441500Z graph_break [] 2025-12-04T12:12:57.8441710Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8442550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8442651Z warnings.warn( 2025-12-04T12:12:57.8442863Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8442985Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8443097Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8443325Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8443653Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8443748Z graph_break [] 2025-12-04T12:12:57.8443970Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8444677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8444776Z warnings.warn( 2025-12-04T12:12:57.8445583Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml - 2025-12-04T12:12:57.8445752Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8446837Z FAILED [0.1585s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8446846Z 2025-12-04T12:12:57.8447056Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8448011Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8448018Z 2025-12-04T12:12:57.8448277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8448454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8448659Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8448755Z Got exit code 1 2025-12-04T12:12:57.8448864Z Retrying single test... 2025-12-04T12:12:57.8449500Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml 2025-12-04T12:12:57.8449699Z ============================= test session starts ============================== 2025-12-04T12:12:57.8450050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8450156Z cachedir: .pytest_cache 2025-12-04T12:12:57.8450692Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8450825Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8450930Z configfile: pytest.ini 2025-12-04T12:12:57.8451505Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8451768Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8452800Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8452923Z Running 1 items in this shard 2025-12-04T12:12:57.8452928Z 2025-12-04T12:12:57.8453833Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5950s] [100%] 2025-12-04T12:12:57.8454788Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1680s] [100%] 2025-12-04T12:12:57.8455613Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1645s] [100%] 2025-12-04T12:12:57.8455618Z 2025-12-04T12:12:57.8455756Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8456349Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8456469Z Traceback (most recent call last): 2025-12-04T12:12:57.8456947Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8457144Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8457353Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8457358Z 2025-12-04T12:12:57.8457580Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8458522Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8458527Z 2025-12-04T12:12:57.8458802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8459016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8459126Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8459253Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8459590Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8459820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8459918Z graph_break [] 2025-12-04T12:12:57.8460128Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8460866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8460966Z warnings.warn( 2025-12-04T12:12:57.8461530Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8461697Z Traceback (most recent call last): 2025-12-04T12:12:57.8462156Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8462366Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8462605Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8462611Z 2025-12-04T12:12:57.8462819Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8463773Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8463778Z 2025-12-04T12:12:57.8464069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8464297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8464410Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8464522Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8464867Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8465083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8465210Z graph_break [] 2025-12-04T12:12:57.8465432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8466145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8466260Z warnings.warn( 2025-12-04T12:12:57.8466471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8466578Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8466702Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8466918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8467246Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8467355Z graph_break [] 2025-12-04T12:12:57.8467566Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8468293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8468390Z warnings.warn( 2025-12-04T12:12:57.8468528Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8469104Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8469221Z Traceback (most recent call last): 2025-12-04T12:12:57.8469692Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8469887Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8470090Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8470095Z 2025-12-04T12:12:57.8470315Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8471253Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8471258Z 2025-12-04T12:12:57.8471529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8471742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8471851Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8471976Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8472342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8472553Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8472661Z graph_break [] 2025-12-04T12:12:57.8472872Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8473635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8473734Z warnings.warn( 2025-12-04T12:12:57.8473947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8474070Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8474180Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8474421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8474766Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8474864Z graph_break [] 2025-12-04T12:12:57.8475074Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8475803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8475932Z warnings.warn( 2025-12-04T12:12:57.8476150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8476260Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8476372Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8476597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8476925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8477021Z graph_break [] 2025-12-04T12:12:57.8477242Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8477950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8478058Z warnings.warn( 2025-12-04T12:12:57.8478859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml - 2025-12-04T12:12:57.8479027Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8480110Z FAILED [0.1645s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8480117Z 2025-12-04T12:12:57.8480328Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8481281Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8481288Z 2025-12-04T12:12:57.8481550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8481742Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8481936Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ================== 2025-12-04T12:12:57.8482032Z Got exit code 1 2025-12-04T12:12:57.8482973Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8483380Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8484017Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml 2025-12-04T12:12:57.8484235Z ============================= test session starts ============================== 2025-12-04T12:12:57.8484577Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8484734Z cachedir: .pytest_cache 2025-12-04T12:12:57.8485244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8485365Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8485487Z configfile: pytest.ini 2025-12-04T12:12:57.8486064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8486338Z collecting ... collected 380 items / 117 deselected / 263 selected 2025-12-04T12:12:57.8486484Z stepcurrent: skipping 117 already run items. 2025-12-04T12:12:57.8486596Z Running 58 items in this shard 2025-12-04T12:12:57.8486601Z 2025-12-04T12:12:57.8487632Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 1%] 2025-12-04T12:12:57.8488677Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 3%] 2025-12-04T12:12:57.8489701Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [ 5%] 2025-12-04T12:12:57.8490704Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 6%] 2025-12-04T12:12:57.8491728Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 8%] 2025-12-04T12:12:57.8492732Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 10%] 2025-12-04T12:12:57.8493752Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 12%] 2025-12-04T12:12:57.8494756Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 13%] 2025-12-04T12:12:57.8495763Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 15%] 2025-12-04T12:12:57.8496785Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 17%] 2025-12-04T12:12:57.8497794Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 18%] 2025-12-04T12:12:57.8498847Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 20%] 2025-12-04T12:12:57.8499777Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [ 22%] 2025-12-04T12:12:57.8500699Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1638s] [ 22%] 2025-12-04T12:12:57.8501877Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [ 22%] 2025-12-04T12:12:57.8501886Z 2025-12-04T12:12:57.8502042Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8502609Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8502730Z Traceback (most recent call last): 2025-12-04T12:12:57.8503210Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8503452Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8503675Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8503680Z 2025-12-04T12:12:57.8503890Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8504833Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8504841Z 2025-12-04T12:12:57.8505114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8505327Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8505449Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8505561Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8505897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8506127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8506224Z graph_break [] 2025-12-04T12:12:57.8506435Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8507167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8507267Z warnings.warn( 2025-12-04T12:12:57.8507841Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8507961Z Traceback (most recent call last): 2025-12-04T12:12:57.8508419Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8508627Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8508835Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8508840Z 2025-12-04T12:12:57.8509047Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8510001Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8510007Z 2025-12-04T12:12:57.8510264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8510540Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8510651Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8510763Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8511104Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8511357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8511468Z graph_break [] 2025-12-04T12:12:57.8511678Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8512395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8512506Z warnings.warn( 2025-12-04T12:12:57.8512743Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8512851Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8512979Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8513189Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8513530Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8513626Z graph_break [] 2025-12-04T12:12:57.8513838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8514593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8514692Z warnings.warn( 2025-12-04T12:12:57.8514832Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8515414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8515534Z Traceback (most recent call last): 2025-12-04T12:12:57.8516009Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8516202Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8516406Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8516411Z 2025-12-04T12:12:57.8516634Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8517576Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8517581Z 2025-12-04T12:12:57.8517850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8518062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8518170Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8518293Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8518651Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8518911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8519021Z graph_break [] 2025-12-04T12:12:57.8519237Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8519971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8520068Z warnings.warn( 2025-12-04T12:12:57.8520275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8520396Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8520509Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8520723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8521063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8521200Z graph_break [] 2025-12-04T12:12:57.8521425Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8522241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8522346Z warnings.warn( 2025-12-04T12:12:57.8522571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8522678Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8522791Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8523019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8523376Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8523490Z graph_break [] 2025-12-04T12:12:57.8523702Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8524409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8524522Z warnings.warn( 2025-12-04T12:12:57.8525322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml - 2025-12-04T12:12:57.8525552Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8526627Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8526634Z 2025-12-04T12:12:57.8526846Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8527810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8527816Z 2025-12-04T12:12:57.8528077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8528271Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8528485Z ============ 1 failed, 12 skipped, 117 deselected, 2 rerun in 4.98s ============ 2025-12-04T12:12:57.8528583Z Got exit code 1 2025-12-04T12:12:57.8528703Z Retrying single test... 2025-12-04T12:12:57.8529334Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml 2025-12-04T12:12:57.8529508Z ============================= test session starts ============================== 2025-12-04T12:12:57.8529853Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8529965Z cachedir: .pytest_cache 2025-12-04T12:12:57.8530491Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8530614Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8530726Z configfile: pytest.ini 2025-12-04T12:12:57.8531321Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8531545Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8532583Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8532694Z Running 1 items in this shard 2025-12-04T12:12:57.8532740Z 2025-12-04T12:12:57.8533649Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6158s] [100%] 2025-12-04T12:12:57.8534600Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1657s] [100%] 2025-12-04T12:12:57.8535429Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1615s] [100%] 2025-12-04T12:12:57.8535435Z 2025-12-04T12:12:57.8535616Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8536183Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8536320Z Traceback (most recent call last): 2025-12-04T12:12:57.8536781Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8536977Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8537209Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8537326Z 2025-12-04T12:12:57.8537536Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8538495Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8538500Z 2025-12-04T12:12:57.8538760Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8538976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8539102Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8539217Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8539546Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8539774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8539873Z graph_break [] 2025-12-04T12:12:57.8540103Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8540821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8540920Z warnings.warn( 2025-12-04T12:12:57.8541501Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8541619Z Traceback (most recent call last): 2025-12-04T12:12:57.8542087Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8542285Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8542493Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8542498Z 2025-12-04T12:12:57.8542728Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8543677Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8543681Z 2025-12-04T12:12:57.8543956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8544169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8544277Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8544402Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8544774Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8544990Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8545100Z graph_break [] 2025-12-04T12:12:57.8545310Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8546074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8546175Z warnings.warn( 2025-12-04T12:12:57.8546384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8546507Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8546617Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8546858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8547202Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8547299Z graph_break [] 2025-12-04T12:12:57.8547510Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8548231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8548358Z warnings.warn( 2025-12-04T12:12:57.8548510Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8549078Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8549195Z Traceback (most recent call last): 2025-12-04T12:12:57.8549674Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8549869Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8550090Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8550095Z 2025-12-04T12:12:57.8550304Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8551250Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8551256Z 2025-12-04T12:12:57.8551529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8551741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8551865Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8551980Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8552310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8552538Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8552636Z graph_break [] 2025-12-04T12:12:57.8552847Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8553577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8553681Z warnings.warn( 2025-12-04T12:12:57.8553904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8554014Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8554127Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8554355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8554687Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8554783Z graph_break [] 2025-12-04T12:12:57.8555004Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8555751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8555864Z warnings.warn( 2025-12-04T12:12:57.8556073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8556214Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8556337Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8556551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8556879Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8556988Z graph_break [] 2025-12-04T12:12:57.8557195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8557948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8558048Z warnings.warn( 2025-12-04T12:12:57.8558844Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml - 2025-12-04T12:12:57.8559026Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8560127Z FAILED [0.1615s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8560133Z 2025-12-04T12:12:57.8560359Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8561297Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8561304Z 2025-12-04T12:12:57.8561563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8561754Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8561950Z ================== 1 failed, 174 deselected, 2 rerun in 5.00s ================== 2025-12-04T12:12:57.8562064Z Got exit code 1 2025-12-04T12:12:57.8562248Z Retrying single test... 2025-12-04T12:12:57.8562883Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml 2025-12-04T12:12:57.8563058Z ============================= test session starts ============================== 2025-12-04T12:12:57.8563402Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8563509Z cachedir: .pytest_cache 2025-12-04T12:12:57.8564031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8564153Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8564273Z configfile: pytest.ini 2025-12-04T12:12:57.8564847Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8565073Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8566111Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8566224Z Running 1 items in this shard 2025-12-04T12:12:57.8566231Z 2025-12-04T12:12:57.8567147Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5680s] [100%] 2025-12-04T12:12:57.8568098Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%] 2025-12-04T12:12:57.8568966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1591s] [100%] 2025-12-04T12:12:57.8568974Z 2025-12-04T12:12:57.8569112Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8569676Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8569851Z Traceback (most recent call last): 2025-12-04T12:12:57.8570311Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8570519Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8570723Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8570727Z 2025-12-04T12:12:57.8570936Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8571887Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8571924Z 2025-12-04T12:12:57.8572184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8572410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8572519Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8572632Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8572973Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8573190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8573285Z graph_break [] 2025-12-04T12:12:57.8573510Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8574234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8574350Z warnings.warn( 2025-12-04T12:12:57.8574911Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8575030Z Traceback (most recent call last): 2025-12-04T12:12:57.8575505Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8575697Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8575902Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8575923Z 2025-12-04T12:12:57.8576130Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8577076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8577083Z 2025-12-04T12:12:57.8577356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8577568Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8577678Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8577800Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8578132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8578357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8578488Z graph_break [] 2025-12-04T12:12:57.8578697Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8579430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8579530Z warnings.warn( 2025-12-04T12:12:57.8579775Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8579896Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8580009Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8580234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8580568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8580693Z graph_break [] 2025-12-04T12:12:57.8580917Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8581652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8581754Z warnings.warn( 2025-12-04T12:12:57.8581906Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8582470Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8582638Z Traceback (most recent call last): 2025-12-04T12:12:57.8583094Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8583289Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8583512Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8583517Z 2025-12-04T12:12:57.8583726Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8584680Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8584685Z 2025-12-04T12:12:57.8584945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8585160Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8585285Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8585397Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8585742Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8585957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8586053Z graph_break [] 2025-12-04T12:12:57.8586280Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8586995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8587096Z warnings.warn( 2025-12-04T12:12:57.8587321Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8587430Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8587563Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8587779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8588108Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8588218Z graph_break [] 2025-12-04T12:12:57.8588428Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8589141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8589295Z warnings.warn( 2025-12-04T12:12:57.8589505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8589629Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8589739Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8589952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8590327Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8590425Z graph_break [] 2025-12-04T12:12:57.8590632Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8591354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8591478Z warnings.warn( 2025-12-04T12:12:57.8592288Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml - 2025-12-04T12:12:57.8592462Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8593530Z FAILED [0.1591s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8593565Z 2025-12-04T12:12:57.8593791Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8594730Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8594735Z 2025-12-04T12:12:57.8595008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8595185Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8595383Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8595489Z Got exit code 1 2025-12-04T12:12:57.8596346Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8596759Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8597381Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml 2025-12-04T12:12:57.8597541Z ============================= test session starts ============================== 2025-12-04T12:12:57.8597899Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8598004Z cachedir: .pytest_cache 2025-12-04T12:12:57.8598528Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8598646Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8598751Z configfile: pytest.ini 2025-12-04T12:12:57.8599339Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8599566Z collecting ... collected 380 items / 130 deselected / 250 selected 2025-12-04T12:12:57.8599710Z stepcurrent: skipping 130 already run items. 2025-12-04T12:12:57.8599833Z Running 45 items in this shard 2025-12-04T12:12:57.8599838Z 2025-12-04T12:12:57.8601080Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.8602171Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [ 4%] 2025-12-04T12:12:57.8603333Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0034s] (Skip non-critical tests to save resources.) [ 6%] 2025-12-04T12:12:57.8604359Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 8%] 2025-12-04T12:12:57.8605303Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5510s] [ 11%] 2025-12-04T12:12:57.8606214Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1624s] [ 11%] 2025-12-04T12:12:57.8607031Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1615s] [ 11%] 2025-12-04T12:12:57.8607083Z 2025-12-04T12:12:57.8607227Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8607800Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8607919Z Traceback (most recent call last): 2025-12-04T12:12:57.8608398Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8608593Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8608803Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8608808Z 2025-12-04T12:12:57.8609034Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8609968Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8609975Z 2025-12-04T12:12:57.8610249Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8610465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8610576Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8610701Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8611034Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8611245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8611359Z graph_break [] 2025-12-04T12:12:57.8611569Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8612302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8612401Z warnings.warn( 2025-12-04T12:12:57.8612957Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8613087Z Traceback (most recent call last): 2025-12-04T12:12:57.8613545Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8613751Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8613958Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8614012Z 2025-12-04T12:12:57.8614224Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8615164Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8615171Z 2025-12-04T12:12:57.8615482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8615707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8615814Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8615927Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8616274Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8616519Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8616619Z graph_break [] 2025-12-04T12:12:57.8616843Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8617564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8617679Z warnings.warn( 2025-12-04T12:12:57.8617892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8618033Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8618157Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8618373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8618701Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8618809Z graph_break [] 2025-12-04T12:12:57.8619023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8619743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8619842Z warnings.warn( 2025-12-04T12:12:57.8619981Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8620550Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8620669Z Traceback (most recent call last): 2025-12-04T12:12:57.8621125Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8621332Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8621537Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8621542Z 2025-12-04T12:12:57.8621764Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8622695Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8622702Z 2025-12-04T12:12:57.8622961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8623185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8623294Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8623418Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8623747Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8623960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8624067Z graph_break [] 2025-12-04T12:12:57.8624276Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8624989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8625133Z warnings.warn( 2025-12-04T12:12:57.8625342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8625463Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8625575Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8625821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8626163Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8626258Z graph_break [] 2025-12-04T12:12:57.8626465Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8627215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8627313Z warnings.warn( 2025-12-04T12:12:57.8627530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8627639Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8627749Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8627972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8628301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8628427Z graph_break [] 2025-12-04T12:12:57.8628645Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8629350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8629458Z warnings.warn( 2025-12-04T12:12:57.8630254Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml - 2025-12-04T12:12:57.8630421Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8631491Z FAILED [0.1615s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8631501Z 2025-12-04T12:12:57.8631711Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8632656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8632661Z 2025-12-04T12:12:57.8632922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8633095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8633321Z ============ 1 failed, 4 skipped, 130 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:57.8633420Z Got exit code 1 2025-12-04T12:12:57.8633537Z Retrying single test... 2025-12-04T12:12:57.8634161Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml 2025-12-04T12:12:57.8634326Z ============================= test session starts ============================== 2025-12-04T12:12:57.8634680Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8634787Z cachedir: .pytest_cache 2025-12-04T12:12:57.8635298Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8635437Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8635546Z configfile: pytest.ini 2025-12-04T12:12:57.8636139Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8636399Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8637440Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8637567Z Running 1 items in this shard 2025-12-04T12:12:57.8637572Z 2025-12-04T12:12:57.8638477Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5556s] [100%] 2025-12-04T12:12:57.8639415Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1615s] [100%] 2025-12-04T12:12:57.8640233Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1590s] [100%] 2025-12-04T12:12:57.8640239Z 2025-12-04T12:12:57.8640386Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8640977Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8641095Z Traceback (most recent call last): 2025-12-04T12:12:57.8641567Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8641760Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8641981Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8641985Z 2025-12-04T12:12:57.8642277Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8643219Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8643224Z 2025-12-04T12:12:57.8643497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8643713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8643836Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8643948Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8644279Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8644509Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8644607Z graph_break [] 2025-12-04T12:12:57.8644819Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8645554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8645658Z warnings.warn( 2025-12-04T12:12:57.8646229Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8646352Z Traceback (most recent call last): 2025-12-04T12:12:57.8646812Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8647020Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8647226Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8647231Z 2025-12-04T12:12:57.8647445Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8648398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8648447Z 2025-12-04T12:12:57.8648708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8648934Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8649076Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8649190Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8649537Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8649754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8649865Z graph_break [] 2025-12-04T12:12:57.8650077Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8650828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8650946Z warnings.warn( 2025-12-04T12:12:57.8651157Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8651265Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8651392Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8651610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8651988Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8652087Z graph_break [] 2025-12-04T12:12:57.8652298Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8653026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8653125Z warnings.warn( 2025-12-04T12:12:57.8653265Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8653839Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8653955Z Traceback (most recent call last): 2025-12-04T12:12:57.8654429Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8654625Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8654828Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8654832Z 2025-12-04T12:12:57.8655055Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8655988Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8655993Z 2025-12-04T12:12:57.8656263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8656474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8656584Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8656707Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8657041Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8657268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8657363Z graph_break [] 2025-12-04T12:12:57.8657569Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8658295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8658394Z warnings.warn( 2025-12-04T12:12:57.8658602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8658776Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8658887Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8659101Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8659440Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8659536Z graph_break [] 2025-12-04T12:12:57.8659788Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8660505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8660603Z warnings.warn( 2025-12-04T12:12:57.8660823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8660963Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8661077Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8661306Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8661640Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8661747Z graph_break [] 2025-12-04T12:12:57.8661959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8662669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8662813Z warnings.warn( 2025-12-04T12:12:57.8663610Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml - 2025-12-04T12:12:57.8663794Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8664857Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8664864Z 2025-12-04T12:12:57.8665075Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8666025Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8666032Z 2025-12-04T12:12:57.8666292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8666485Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8666678Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.8666777Z Got exit code 1 2025-12-04T12:12:57.8666896Z Retrying single test... 2025-12-04T12:12:57.8667531Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml 2025-12-04T12:12:57.8667703Z ============================= test session starts ============================== 2025-12-04T12:12:57.8668041Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8668152Z cachedir: .pytest_cache 2025-12-04T12:12:57.8668667Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8668786Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8668889Z configfile: pytest.ini 2025-12-04T12:12:57.8669476Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8669700Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8670738Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8670890Z Running 1 items in this shard 2025-12-04T12:12:57.8670896Z 2025-12-04T12:12:57.8671829Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5760s] [100%] 2025-12-04T12:12:57.8672742Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1640s] [100%] 2025-12-04T12:12:57.8673600Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1588s] [100%] 2025-12-04T12:12:57.8673605Z 2025-12-04T12:12:57.8673762Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8674319Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8674449Z Traceback (most recent call last): 2025-12-04T12:12:57.8674912Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8675139Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8675359Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8675364Z 2025-12-04T12:12:57.8675572Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8676519Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8676526Z 2025-12-04T12:12:57.8676784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8676993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8677114Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8677229Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8677564Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8677788Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8677882Z graph_break [] 2025-12-04T12:12:57.8678109Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8678826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8678924Z warnings.warn( 2025-12-04T12:12:57.8679490Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8679614Z Traceback (most recent call last): 2025-12-04T12:12:57.8680083Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8680280Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8680486Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8680491Z 2025-12-04T12:12:57.8680711Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8681639Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8681644Z 2025-12-04T12:12:57.8681917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8682232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8682345Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8682473Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8682801Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8683054Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8683166Z graph_break [] 2025-12-04T12:12:57.8683377Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8684106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8684206Z warnings.warn( 2025-12-04T12:12:57.8684445Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8684569Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8684684Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8684896Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8685235Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8685331Z graph_break [] 2025-12-04T12:12:57.8685553Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8686340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8686438Z warnings.warn( 2025-12-04T12:12:57.8686589Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8687150Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.8687268Z Traceback (most recent call last): 2025-12-04T12:12:57.8687742Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8687934Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8688156Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8688161Z 2025-12-04T12:12:57.8688373Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8689307Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8689326Z 2025-12-04T12:12:57.8689583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8689795Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8689918Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8690030Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8690357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8690584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8690680Z graph_break [] 2025-12-04T12:12:57.8690892Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8691620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8691717Z warnings.warn( 2025-12-04T12:12:57.8691940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8692046Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8692157Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8692383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8692710Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8692840Z graph_break [] 2025-12-04T12:12:57.8693061Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8693806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8693921Z warnings.warn( 2025-12-04T12:12:57.8694129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8694236Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8694359Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8694572Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8694928Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8695037Z graph_break [] 2025-12-04T12:12:57.8695244Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8695964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8696065Z warnings.warn( 2025-12-04T12:12:57.8696869Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml - 2025-12-04T12:12:57.8697098Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8698162Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8698168Z 2025-12-04T12:12:57.8698388Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8699322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8699328Z 2025-12-04T12:12:57.8699586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8699776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8699971Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.8700082Z Got exit code 1 2025-12-04T12:12:57.8701134Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.8701538Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8702185Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml 2025-12-04T12:12:57.8702345Z ============================= test session starts ============================== 2025-12-04T12:12:57.8702717Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8702827Z cachedir: .pytest_cache 2025-12-04T12:12:57.8703336Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8703473Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8703580Z configfile: pytest.ini 2025-12-04T12:12:57.8704159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8704396Z collecting ... collected 380 items / 135 deselected / 245 selected 2025-12-04T12:12:57.8704630Z stepcurrent: skipping 135 already run items. 2025-12-04T12:12:57.8704760Z Running 40 items in this shard 2025-12-04T12:12:57.8704765Z 2025-12-04T12:12:57.8705702Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5947s] [ 2%] 2025-12-04T12:12:57.8706617Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1689s] [ 2%] 2025-12-04T12:12:57.8707431Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1658s] [ 2%] 2025-12-04T12:12:57.8707476Z 2025-12-04T12:12:57.8707615Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8708188Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8708309Z Traceback (most recent call last): 2025-12-04T12:12:57.8708786Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8709026Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8709232Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8709237Z 2025-12-04T12:12:57.8709463Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8710399Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8710404Z 2025-12-04T12:12:57.8710680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8710896Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8711008Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8711135Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8711467Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8711686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8711799Z graph_break [] 2025-12-04T12:12:57.8712012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8714688Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8714795Z return x.grad, w.grad 2025-12-04T12:12:57.8715524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8715623Z warnings.warn( 2025-12-04T12:12:57.8718267Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8718426Z return x.grad, w.grad 2025-12-04T12:12:57.8718979Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8719146Z Traceback (most recent call last): 2025-12-04T12:12:57.8719609Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8719802Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8720023Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8720028Z 2025-12-04T12:12:57.8720241Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8721293Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8721301Z 2025-12-04T12:12:57.8721561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8721787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8721899Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8722048Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8722458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8722673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8722768Z graph_break [] 2025-12-04T12:12:57.8722992Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8725646Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8725767Z return x.grad, w.grad 2025-12-04T12:12:57.8726478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8726574Z warnings.warn( 2025-12-04T12:12:57.8729228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8729332Z return x.grad, w.grad 2025-12-04T12:12:57.8729565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8729674Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8729801Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8730014Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8730345Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8730459Z graph_break [] 2025-12-04T12:12:57.8730668Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8733352Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8733486Z return x.grad, w.grad 2025-12-04T12:12:57.8734199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8734337Z warnings.warn( 2025-12-04T12:12:57.8736986Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8737136Z return x.grad, w.grad 2025-12-04T12:12:57.8737275Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8737842Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8737961Z Traceback (most recent call last): 2025-12-04T12:12:57.8738417Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8738626Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8738831Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8738837Z 2025-12-04T12:12:57.8739058Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8739995Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8740002Z 2025-12-04T12:12:57.8740260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8740484Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8740592Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8740707Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8741049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8741262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8741371Z graph_break [] 2025-12-04T12:12:57.8741578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8744228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8744346Z return x.grad, w.grad 2025-12-04T12:12:57.8745059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8745204Z warnings.warn( 2025-12-04T12:12:57.8747866Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8747986Z return x.grad, w.grad 2025-12-04T12:12:57.8748239Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8748349Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8748477Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8748693Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8749037Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8749132Z graph_break [] 2025-12-04T12:12:57.8749382Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8752047Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8752150Z return x.grad, w.grad 2025-12-04T12:12:57.8752878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8752974Z warnings.warn( 2025-12-04T12:12:57.8755626Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8755728Z return x.grad, w.grad 2025-12-04T12:12:57.8755938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8756058Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8756170Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8756400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8756732Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8756828Z graph_break [] 2025-12-04T12:12:57.8757051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8757764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8757861Z warnings.warn( 2025-12-04T12:12:57.8760543Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8760682Z return x.grad, w.grad 2025-12-04T12:12:57.8761498Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml - 2025-12-04T12:12:57.8761667Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8762838Z FAILED [0.1658s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8762847Z 2025-12-04T12:12:57.8763059Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8763993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8764047Z 2025-12-04T12:12:57.8764309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8764485Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8764693Z ================== 1 failed, 135 deselected, 2 rerun in 4.98s ================== 2025-12-04T12:12:57.8764790Z Got exit code 1 2025-12-04T12:12:57.8764896Z Retrying single test... 2025-12-04T12:12:57.8765537Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml 2025-12-04T12:12:57.8765697Z ============================= test session starts ============================== 2025-12-04T12:12:57.8766053Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8766159Z cachedir: .pytest_cache 2025-12-04T12:12:57.8766669Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8766802Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8766906Z configfile: pytest.ini 2025-12-04T12:12:57.8767482Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8767719Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8768735Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8768863Z Running 1 items in this shard 2025-12-04T12:12:57.8768868Z 2025-12-04T12:12:57.8769765Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5585s] [100%] 2025-12-04T12:12:57.8770672Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1662s] [100%] 2025-12-04T12:12:57.8771488Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1620s] [100%] 2025-12-04T12:12:57.8771494Z 2025-12-04T12:12:57.8771630Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8772225Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8772344Z Traceback (most recent call last): 2025-12-04T12:12:57.8772843Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8773039Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8773243Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8773248Z 2025-12-04T12:12:57.8773467Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8774430Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8774435Z 2025-12-04T12:12:57.8774708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8774922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8775031Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8775155Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8775488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8775745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8775842Z graph_break [] 2025-12-04T12:12:57.8776051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8778729Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8778838Z return x.grad, w.grad 2025-12-04T12:12:57.8779573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8779677Z warnings.warn( 2025-12-04T12:12:57.8782330Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8782436Z return x.grad, w.grad 2025-12-04T12:12:57.8782992Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8783130Z Traceback (most recent call last): 2025-12-04T12:12:57.8783589Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8783800Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8784007Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8784012Z 2025-12-04T12:12:57.8784224Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8785169Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8785227Z 2025-12-04T12:12:57.8785487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8785740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8785853Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8785967Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8791619Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8791882Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8791984Z graph_break [] 2025-12-04T12:12:57.8792303Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8794977Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8795152Z return x.grad, w.grad 2025-12-04T12:12:57.8795878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8795977Z warnings.warn( 2025-12-04T12:12:57.8798642Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8798752Z return x.grad, w.grad 2025-12-04T12:12:57.8798983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8799092Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8799222Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8799444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8799777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8799891Z graph_break [] 2025-12-04T12:12:57.8800106Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8803089Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8803201Z return x.grad, w.grad 2025-12-04T12:12:57.8803926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8804042Z warnings.warn( 2025-12-04T12:12:57.8806750Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8806940Z return x.grad, w.grad 2025-12-04T12:12:57.8807085Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8807696Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8807818Z Traceback (most recent call last): 2025-12-04T12:12:57.8808270Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8808496Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8808708Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8808714Z 2025-12-04T12:12:57.8808927Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8809915Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8809921Z 2025-12-04T12:12:57.8810180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8810410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8810519Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8810630Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8810977Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8811193Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8811288Z graph_break [] 2025-12-04T12:12:57.8811513Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8814158Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8814277Z return x.grad, w.grad 2025-12-04T12:12:57.8814996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8815108Z warnings.warn( 2025-12-04T12:12:57.8817737Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8817853Z return x.grad, w.grad 2025-12-04T12:12:57.8818066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8818208Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8818335Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8818554Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8818887Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8819028Z graph_break [] 2025-12-04T12:12:57.8819239Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8821922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8822029Z return x.grad, w.grad 2025-12-04T12:12:57.8822756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8822888Z warnings.warn( 2025-12-04T12:12:57.8825515Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8825634Z return x.grad, w.grad 2025-12-04T12:12:57.8825843Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8825962Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8826073Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8826291Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8826635Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8826730Z graph_break [] 2025-12-04T12:12:57.8826939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8827663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8827761Z warnings.warn( 2025-12-04T12:12:57.8830400Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8830508Z return x.grad, w.grad 2025-12-04T12:12:57.8831324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml - 2025-12-04T12:12:57.8831495Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8832564Z FAILED [0.1620s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8832616Z 2025-12-04T12:12:57.8832827Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8833787Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8833795Z 2025-12-04T12:12:57.8834069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8834244Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8834514Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.8834612Z Got exit code 1 2025-12-04T12:12:57.8834715Z Retrying single test... 2025-12-04T12:12:57.8835352Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml 2025-12-04T12:12:57.8835512Z ============================= test session starts ============================== 2025-12-04T12:12:57.8835853Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8836006Z cachedir: .pytest_cache 2025-12-04T12:12:57.8836514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8836646Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8836753Z configfile: pytest.ini 2025-12-04T12:12:57.8837332Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8837570Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8838584Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8838710Z Running 1 items in this shard 2025-12-04T12:12:57.8838715Z 2025-12-04T12:12:57.8839608Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.6065s] [100%] 2025-12-04T12:12:57.8840502Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1695s] [100%] 2025-12-04T12:12:57.8841327Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1666s] [100%] 2025-12-04T12:12:57.8841335Z 2025-12-04T12:12:57.8841471Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8842036Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8842220Z Traceback (most recent call last): 2025-12-04T12:12:57.8842686Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8842898Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8843106Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8843112Z 2025-12-04T12:12:57.8843334Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8844270Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8844313Z 2025-12-04T12:12:57.8844588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8844804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8844915Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8845082Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8845416Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8845631Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8845740Z graph_break [] 2025-12-04T12:12:57.8845950Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8848641Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8848775Z return x.grad, w.grad 2025-12-04T12:12:57.8849493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8849603Z warnings.warn( 2025-12-04T12:12:57.8852220Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8852338Z return x.grad, w.grad 2025-12-04T12:12:57.8852893Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8853023Z Traceback (most recent call last): 2025-12-04T12:12:57.8853480Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8853673Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8853894Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8853901Z 2025-12-04T12:12:57.8854107Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8855056Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8855062Z 2025-12-04T12:12:57.8855321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8855536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8855659Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8855774Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8856120Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8856334Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8856431Z graph_break [] 2025-12-04T12:12:57.8856656Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8859352Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8859502Z return x.grad, w.grad 2025-12-04T12:12:57.8860218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8860345Z warnings.warn( 2025-12-04T12:12:57.8862999Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8863136Z return x.grad, w.grad 2025-12-04T12:12:57.8863364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8863470Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8863584Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8863821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8864151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8864261Z graph_break [] 2025-12-04T12:12:57.8864470Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8867104Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8867220Z return x.grad, w.grad 2025-12-04T12:12:57.8867930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8868042Z warnings.warn( 2025-12-04T12:12:57.8870680Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8870795Z return x.grad, w.grad 2025-12-04T12:12:57.8870935Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8871490Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.8871622Z Traceback (most recent call last): 2025-12-04T12:12:57.8872115Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8872320Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8872524Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8872531Z 2025-12-04T12:12:57.8872772Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8873715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8873721Z 2025-12-04T12:12:57.8873981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8874233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8874342Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8874456Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8874800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8875014Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8875121Z graph_break [] 2025-12-04T12:12:57.8875339Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8878017Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8878139Z return x.grad, w.grad 2025-12-04T12:12:57.8878848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8878962Z warnings.warn( 2025-12-04T12:12:57.8881592Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8881708Z return x.grad, w.grad 2025-12-04T12:12:57.8881918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8882026Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8882229Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8882450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8882787Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8882898Z graph_break [] 2025-12-04T12:12:57.8883108Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8885762Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8885910Z return x.grad, w.grad 2025-12-04T12:12:57.8886667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8886769Z warnings.warn( 2025-12-04T12:12:57.8889439Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8889560Z return x.grad, w.grad 2025-12-04T12:12:57.8889771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8889895Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8890036Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8890254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8890595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8890690Z graph_break [] 2025-12-04T12:12:57.8890911Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8891628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8891725Z warnings.warn( 2025-12-04T12:12:57.8894378Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.8894485Z return x.grad, w.grad 2025-12-04T12:12:57.8895299Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml - 2025-12-04T12:12:57.8895467Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8896538Z FAILED [0.1666s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8896543Z 2025-12-04T12:12:57.8896757Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8897690Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8897696Z 2025-12-04T12:12:57.8897969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8898145Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8898358Z ================== 1 failed, 174 deselected, 2 rerun in 5.00s ================== 2025-12-04T12:12:57.8898455Z Got exit code 1 2025-12-04T12:12:57.8899335Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.8899749Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.8900421Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml 2025-12-04T12:12:57.8900594Z ============================= test session starts ============================== 2025-12-04T12:12:57.8901196Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8901304Z cachedir: .pytest_cache 2025-12-04T12:12:57.8901896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8902018Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8902123Z configfile: pytest.ini 2025-12-04T12:12:57.8902713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8902934Z collecting ... collected 380 items / 136 deselected / 244 selected 2025-12-04T12:12:57.8903151Z stepcurrent: skipping 136 already run items. 2025-12-04T12:12:57.8903263Z Running 39 items in this shard 2025-12-04T12:12:57.8903269Z 2025-12-04T12:12:57.8904281Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.8905301Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0032s] (Skip non-critical tests to save resources.) [ 5%] 2025-12-04T12:12:57.8906302Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 7%] 2025-12-04T12:12:57.8907209Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5874s] [ 10%] 2025-12-04T12:12:57.8908098Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1663s] [ 10%] 2025-12-04T12:12:57.8908927Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1649s] [ 10%] 2025-12-04T12:12:57.8908934Z 2025-12-04T12:12:57.8909071Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8909647Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8909769Z Traceback (most recent call last): 2025-12-04T12:12:57.8910237Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8910443Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8910650Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8910655Z 2025-12-04T12:12:57.8910863Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8911818Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8911868Z 2025-12-04T12:12:57.8912128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8912356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8912469Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8912583Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8912969Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8913186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8913298Z graph_break [] 2025-12-04T12:12:57.8913508Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8914259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8914372Z warnings.warn( 2025-12-04T12:12:57.8914928Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8915045Z Traceback (most recent call last): 2025-12-04T12:12:57.8915513Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8915741Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8915960Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8915965Z 2025-12-04T12:12:57.8916173Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8917108Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8917127Z 2025-12-04T12:12:57.8917385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8917598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8917721Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8917833Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8918167Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8918399Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8918494Z graph_break [] 2025-12-04T12:12:57.8918701Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8919433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8919537Z warnings.warn( 2025-12-04T12:12:57.8919757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8919865Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8919978Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8920203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8920533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8920631Z graph_break [] 2025-12-04T12:12:57.8920858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8921564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8921675Z warnings.warn( 2025-12-04T12:12:57.8921813Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8922455Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8922590Z Traceback (most recent call last): 2025-12-04T12:12:57.8923084Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8923292Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8923501Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8923509Z 2025-12-04T12:12:57.8923749Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8924696Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8924701Z 2025-12-04T12:12:57.8924961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8925214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8925322Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8925435Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8925776Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8925987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8926081Z graph_break [] 2025-12-04T12:12:57.8926308Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8927051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8927162Z warnings.warn( 2025-12-04T12:12:57.8927372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8927480Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8927608Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8927819Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8928149Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8928254Z graph_break [] 2025-12-04T12:12:57.8928464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8929185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8929284Z warnings.warn( 2025-12-04T12:12:57.8929491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8929613Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8929726Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8929938Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8930277Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8930370Z graph_break [] 2025-12-04T12:12:57.8930578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8931295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8931390Z warnings.warn( 2025-12-04T12:12:57.8932199Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml - 2025-12-04T12:12:57.8932363Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8936160Z FAILED [0.1649s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8936180Z 2025-12-04T12:12:57.8936420Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8937490Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8937495Z 2025-12-04T12:12:57.8937772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8937948Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8938167Z ============ 1 failed, 3 skipped, 136 deselected, 2 rerun in 4.99s ============= 2025-12-04T12:12:57.8938277Z Got exit code 1 2025-12-04T12:12:57.8938382Z Retrying single test... 2025-12-04T12:12:57.8939047Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml 2025-12-04T12:12:57.8939264Z ============================= test session starts ============================== 2025-12-04T12:12:57.8939613Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8939740Z cachedir: .pytest_cache 2025-12-04T12:12:57.8940244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8940364Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8940516Z configfile: pytest.ini 2025-12-04T12:12:57.8941089Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8941309Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8942354Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8942466Z Running 1 items in this shard 2025-12-04T12:12:57.8942473Z 2025-12-04T12:12:57.8943382Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5868s] [100%] 2025-12-04T12:12:57.8944276Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1619s] [100%] 2025-12-04T12:12:57.8945104Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1601s] [100%] 2025-12-04T12:12:57.8945111Z 2025-12-04T12:12:57.8945248Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8945806Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8945941Z Traceback (most recent call last): 2025-12-04T12:12:57.8946400Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8946602Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8946812Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8946817Z 2025-12-04T12:12:57.8947027Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8947971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8947976Z 2025-12-04T12:12:57.8948308Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8948538Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8948677Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8948793Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8949134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8949348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8949445Z graph_break [] 2025-12-04T12:12:57.8949669Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8950389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8950504Z warnings.warn( 2025-12-04T12:12:57.8951118Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8951239Z Traceback (most recent call last): 2025-12-04T12:12:57.8951712Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8951908Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8952116Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8952135Z 2025-12-04T12:12:57.8952342Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8953801Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8953806Z 2025-12-04T12:12:57.8954080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8954296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8954424Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8954538Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8954875Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8955105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8955202Z graph_break [] 2025-12-04T12:12:57.8955412Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8956144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8956240Z warnings.warn( 2025-12-04T12:12:57.8956474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8956580Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8956691Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8956923Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8957251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8957348Z graph_break [] 2025-12-04T12:12:57.8957569Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8958282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8958398Z warnings.warn( 2025-12-04T12:12:57.8958537Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8959097Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8959230Z Traceback (most recent call last): 2025-12-04T12:12:57.8959738Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8959945Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8960180Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8960186Z 2025-12-04T12:12:57.8960393Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8961343Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8961350Z 2025-12-04T12:12:57.8961609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8961836Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8961944Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8962056Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8962545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8962763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8962861Z graph_break [] 2025-12-04T12:12:57.8963085Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8963797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8963943Z warnings.warn( 2025-12-04T12:12:57.8964152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8964259Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8964385Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8964598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8964929Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8965041Z graph_break [] 2025-12-04T12:12:57.8965254Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8965975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8966073Z warnings.warn( 2025-12-04T12:12:57.8966284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8966406Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8966516Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8966734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8967065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8967161Z graph_break [] 2025-12-04T12:12:57.8967371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8968088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8968188Z warnings.warn( 2025-12-04T12:12:57.8968998Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml - 2025-12-04T12:12:57.8969163Z =========================== short test summary info ============================ 2025-12-04T12:12:57.8970231Z FAILED [0.1601s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8970248Z 2025-12-04T12:12:57.8970458Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8971423Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8971460Z 2025-12-04T12:12:57.8971730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8971902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.8972108Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ================== 2025-12-04T12:12:57.8972204Z Got exit code 1 2025-12-04T12:12:57.8972307Z Retrying single test... 2025-12-04T12:12:57.8972942Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml 2025-12-04T12:12:57.8973097Z ============================= test session starts ============================== 2025-12-04T12:12:57.8973464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.8973583Z cachedir: .pytest_cache 2025-12-04T12:12:57.8974089Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.8974224Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.8974323Z configfile: pytest.ini 2025-12-04T12:12:57.8974897Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.8975166Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.8976181Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8976305Z Running 1 items in this shard 2025-12-04T12:12:57.8976310Z 2025-12-04T12:12:57.8977207Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5539s] [100%] 2025-12-04T12:12:57.8978102Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1620s] [100%] 2025-12-04T12:12:57.8978928Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1552s] [100%] 2025-12-04T12:12:57.8978936Z 2025-12-04T12:12:57.8979071Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.8979634Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8979754Z Traceback (most recent call last): 2025-12-04T12:12:57.8980215Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8980419Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8980624Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8980629Z 2025-12-04T12:12:57.8980848Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8981780Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8981787Z 2025-12-04T12:12:57.8982056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8982266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8982373Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8982497Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8982866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8983112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8983216Z graph_break [] 2025-12-04T12:12:57.8983423Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8984153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8984254Z warnings.warn( 2025-12-04T12:12:57.8984810Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8984940Z Traceback (most recent call last): 2025-12-04T12:12:57.8985430Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8985631Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8985849Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8985856Z 2025-12-04T12:12:57.8986065Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8987013Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8987048Z 2025-12-04T12:12:57.8987307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8987516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8987634Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8987745Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8988089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8988301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8988399Z graph_break [] 2025-12-04T12:12:57.8988620Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8989333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8989435Z warnings.warn( 2025-12-04T12:12:57.8989654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8989762Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8989886Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8990094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8990424Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8990533Z graph_break [] 2025-12-04T12:12:57.8990741Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8991450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8991560Z warnings.warn( 2025-12-04T12:12:57.8991698Z =================================== FAILURES =================================== 2025-12-04T12:12:57.8992267Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.8992385Z Traceback (most recent call last): 2025-12-04T12:12:57.8992840Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.8993044Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.8993256Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.8993261Z 2025-12-04T12:12:57.8993506Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.8994497Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.8994503Z 2025-12-04T12:12:57.8994758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.8994981Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8995087Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8995199Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8995542Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8995754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8995863Z graph_break [] 2025-12-04T12:12:57.8996107Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8996821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8996932Z warnings.warn( 2025-12-04T12:12:57.8997145Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8997251Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8997404Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8997615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.8997951Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.8998044Z graph_break [] 2025-12-04T12:12:57.8998252Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.8998977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.8999074Z warnings.warn( 2025-12-04T12:12:57.8999281Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.8999402Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.8999511Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.8999735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9000061Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9000159Z graph_break [] 2025-12-04T12:12:57.9000375Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9001320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9001423Z warnings.warn( 2025-12-04T12:12:57.9002303Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml - 2025-12-04T12:12:57.9002471Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9003550Z FAILED [0.1552s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9003559Z 2025-12-04T12:12:57.9003770Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9004724Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9004730Z 2025-12-04T12:12:57.9005087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9005270Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9005515Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.9005611Z Got exit code 1 2025-12-04T12:12:57.9006480Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9006887Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9007508Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml 2025-12-04T12:12:57.9007682Z ============================= test session starts ============================== 2025-12-04T12:12:57.9008069Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9008176Z cachedir: .pytest_cache 2025-12-04T12:12:57.9008695Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9008817Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9008938Z configfile: pytest.ini 2025-12-04T12:12:57.9009513Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9009784Z collecting ... collected 380 items / 140 deselected / 240 selected 2025-12-04T12:12:57.9009940Z stepcurrent: skipping 140 already run items. 2025-12-04T12:12:57.9010051Z Running 35 items in this shard 2025-12-04T12:12:57.9010056Z 2025-12-04T12:12:57.9011084Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 2%] 2025-12-04T12:12:57.9012084Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 5%] 2025-12-04T12:12:57.9012978Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5577s] [ 8%] 2025-12-04T12:12:57.9013878Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1590s] [ 8%] 2025-12-04T12:12:57.9014704Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1571s] [ 8%] 2025-12-04T12:12:57.9014710Z 2025-12-04T12:12:57.9014859Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9015418Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9015547Z Traceback (most recent call last): 2025-12-04T12:12:57.9016007Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9016201Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9016420Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9016425Z 2025-12-04T12:12:57.9016633Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9017625Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9017630Z 2025-12-04T12:12:57.9017921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9018134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9018254Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9018365Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9018697Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9018926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9019022Z graph_break [] 2025-12-04T12:12:57.9019246Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9019998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9020103Z warnings.warn( 2025-12-04T12:12:57.9020674Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9020796Z Traceback (most recent call last): 2025-12-04T12:12:57.9021273Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9021467Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9021706Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9021710Z 2025-12-04T12:12:57.9021936Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9022869Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9022874Z 2025-12-04T12:12:57.9023156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9023369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9023485Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9023611Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9023943Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9024158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9024271Z graph_break [] 2025-12-04T12:12:57.9024484Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9025214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9025313Z warnings.warn( 2025-12-04T12:12:57.9025528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9025654Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9025764Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9025979Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9026322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9026419Z graph_break [] 2025-12-04T12:12:57.9026646Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9027354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9027451Z warnings.warn( 2025-12-04T12:12:57.9027605Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9028157Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9028307Z Traceback (most recent call last): 2025-12-04T12:12:57.9028780Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9029001Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9029218Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9029223Z 2025-12-04T12:12:57.9029431Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9030369Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9030390Z 2025-12-04T12:12:57.9030645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9030885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9031010Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9031120Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9031449Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9031674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9031768Z graph_break [] 2025-12-04T12:12:57.9031977Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9032728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9032824Z warnings.warn( 2025-12-04T12:12:57.9033044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9033150Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9033264Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9033489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9033818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9033914Z graph_break [] 2025-12-04T12:12:57.9034137Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9034847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9034962Z warnings.warn( 2025-12-04T12:12:57.9035171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9035277Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9035401Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9035618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9035949Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9036056Z graph_break [] 2025-12-04T12:12:57.9036263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9036987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9037084Z warnings.warn( 2025-12-04T12:12:57.9037882Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml - 2025-12-04T12:12:57.9038061Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9039116Z FAILED [0.1571s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9039124Z 2025-12-04T12:12:57.9039394Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9040328Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9040363Z 2025-12-04T12:12:57.9040634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9040809Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9041023Z ============ 1 failed, 2 skipped, 140 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:57.9041129Z Got exit code 1 2025-12-04T12:12:57.9041230Z Retrying single test... 2025-12-04T12:12:57.9041851Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml 2025-12-04T12:12:57.9042055Z ============================= test session starts ============================== 2025-12-04T12:12:57.9042474Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9042597Z cachedir: .pytest_cache 2025-12-04T12:12:57.9043106Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9043225Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9043340Z configfile: pytest.ini 2025-12-04T12:12:57.9043950Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9044172Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9045209Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9045322Z Running 1 items in this shard 2025-12-04T12:12:57.9045327Z 2025-12-04T12:12:57.9046239Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [100%] 2025-12-04T12:12:57.9047133Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%] 2025-12-04T12:12:57.9047966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1601s] [100%] 2025-12-04T12:12:57.9047971Z 2025-12-04T12:12:57.9048108Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9048668Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9048800Z Traceback (most recent call last): 2025-12-04T12:12:57.9049260Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9049464Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9049667Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9049674Z 2025-12-04T12:12:57.9049882Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9050822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9050827Z 2025-12-04T12:12:57.9051083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9051344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9051450Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9051592Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9051934Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9052147Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9052241Z graph_break [] 2025-12-04T12:12:57.9052464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9053180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9053292Z warnings.warn( 2025-12-04T12:12:57.9053846Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9053993Z Traceback (most recent call last): 2025-12-04T12:12:57.9054464Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9054656Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9054859Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9054878Z 2025-12-04T12:12:57.9055085Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9056047Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9056053Z 2025-12-04T12:12:57.9056326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9056537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9056662Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9056773Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9057102Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9057330Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9057426Z graph_break [] 2025-12-04T12:12:57.9057636Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9058360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9058459Z warnings.warn( 2025-12-04T12:12:57.9058679Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9058784Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9058893Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9059120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9059447Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9059545Z graph_break [] 2025-12-04T12:12:57.9059760Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9060464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9060574Z warnings.warn( 2025-12-04T12:12:57.9060711Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9061267Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9061395Z Traceback (most recent call last): 2025-12-04T12:12:57.9061855Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9062077Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9062295Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9062329Z 2025-12-04T12:12:57.9062539Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9063484Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9063492Z 2025-12-04T12:12:57.9063745Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9063954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9064073Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9064181Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9064550Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9064763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9064860Z graph_break [] 2025-12-04T12:12:57.9065080Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9065792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9065893Z warnings.warn( 2025-12-04T12:12:57.9066148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9066255Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9066382Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9066594Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9066919Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9067032Z graph_break [] 2025-12-04T12:12:57.9067240Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9067952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9068063Z warnings.warn( 2025-12-04T12:12:57.9068269Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9068386Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9068497Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9068711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9069051Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9069147Z graph_break [] 2025-12-04T12:12:57.9069353Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9070078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9070178Z warnings.warn( 2025-12-04T12:12:57.9070989Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml - 2025-12-04T12:12:57.9071157Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9072221Z FAILED [0.1601s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9072227Z 2025-12-04T12:12:57.9072449Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9073419Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9073424Z 2025-12-04T12:12:57.9073723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9073897Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9074091Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.9074210Z Got exit code 1 2025-12-04T12:12:57.9074316Z Retrying single test... 2025-12-04T12:12:57.9074959Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml 2025-12-04T12:12:57.9075115Z ============================= test session starts ============================== 2025-12-04T12:12:57.9075456Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9075608Z cachedir: .pytest_cache 2025-12-04T12:12:57.9076123Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9076243Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9076365Z configfile: pytest.ini 2025-12-04T12:12:57.9076936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9077173Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9078236Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9078348Z Running 1 items in this shard 2025-12-04T12:12:57.9078354Z 2025-12-04T12:12:57.9079273Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5551s] [100%] 2025-12-04T12:12:57.9080179Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%] 2025-12-04T12:12:57.9081013Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1608s] [100%] 2025-12-04T12:12:57.9081020Z 2025-12-04T12:12:57.9081160Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9081728Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9081846Z Traceback (most recent call last): 2025-12-04T12:12:57.9082382Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9082596Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9082807Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9082813Z 2025-12-04T12:12:57.9083035Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9083969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9083976Z 2025-12-04T12:12:57.9084236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9084467Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9084578Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9084708Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9085085Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9085303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9085446Z graph_break [] 2025-12-04T12:12:57.9085656Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9086378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9086494Z warnings.warn( 2025-12-04T12:12:57.9087052Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9087184Z Traceback (most recent call last): 2025-12-04T12:12:57.9087644Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9087866Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9088090Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9088097Z 2025-12-04T12:12:57.9088306Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9089247Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9089280Z 2025-12-04T12:12:57.9089538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9089748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9089867Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9089978Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9090307Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9090536Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9090629Z graph_break [] 2025-12-04T12:12:57.9090850Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9091565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9091663Z warnings.warn( 2025-12-04T12:12:57.9091886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9091996Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9092107Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9092331Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9092656Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9092768Z graph_break [] 2025-12-04T12:12:57.9092980Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9093690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9093806Z warnings.warn( 2025-12-04T12:12:57.9093944Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9094498Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9094630Z Traceback (most recent call last): 2025-12-04T12:12:57.9095090Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9095296Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9095503Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9095507Z 2025-12-04T12:12:57.9095749Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9096707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9096742Z 2025-12-04T12:12:57.9097002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9097225Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9097336Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9097447Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9097789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9098000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9098096Z graph_break [] 2025-12-04T12:12:57.9098347Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9099062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9099175Z warnings.warn( 2025-12-04T12:12:57.9099383Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9099489Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9099612Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9099854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9100182Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9100287Z graph_break [] 2025-12-04T12:12:57.9100493Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9101580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9101682Z warnings.warn( 2025-12-04T12:12:57.9101891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9102014Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9102176Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9102472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9102815Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9102914Z graph_break [] 2025-12-04T12:12:57.9103134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9103847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9103951Z warnings.warn( 2025-12-04T12:12:57.9104766Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml - 2025-12-04T12:12:57.9104934Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9106019Z FAILED [0.1608s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9106027Z 2025-12-04T12:12:57.9106238Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9107167Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9107185Z 2025-12-04T12:12:57.9107447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9107721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9107935Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:57.9108077Z Got exit code 1 2025-12-04T12:12:57.9108927Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9109345Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9109969Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml 2025-12-04T12:12:57.9110143Z ============================= test session starts ============================== 2025-12-04T12:12:57.9110523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9110634Z cachedir: .pytest_cache 2025-12-04T12:12:57.9111152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9111274Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9111396Z configfile: pytest.ini 2025-12-04T12:12:57.9111971Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9112234Z collecting ... collected 380 items / 143 deselected / 237 selected 2025-12-04T12:12:57.9112392Z stepcurrent: skipping 143 already run items. 2025-12-04T12:12:57.9112506Z Running 32 items in this shard 2025-12-04T12:12:57.9112511Z 2025-12-04T12:12:57.9113529Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0042s] (Skip non-critical tests to save resources.) [ 3%] 2025-12-04T12:12:57.9114553Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 6%] 2025-12-04T12:12:57.9115549Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0038s] (Skip non-critical tests to save resources.) [ 9%] 2025-12-04T12:12:57.9116461Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5282s] [ 12%] 2025-12-04T12:12:57.9117355Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [ 12%] 2025-12-04T12:12:57.9118188Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [ 12%] 2025-12-04T12:12:57.9118196Z 2025-12-04T12:12:57.9118336Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9118909Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9119029Z Traceback (most recent call last): 2025-12-04T12:12:57.9119491Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9119697Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9119904Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9119910Z 2025-12-04T12:12:57.9120139Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9121189Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9121223Z 2025-12-04T12:12:57.9121484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9121713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9121825Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9121935Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9122349Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9122566Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9122674Z graph_break [] 2025-12-04T12:12:57.9122939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9123664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9123775Z warnings.warn( 2025-12-04T12:12:57.9124331Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9124465Z Traceback (most recent call last): 2025-12-04T12:12:57.9124961Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9125154Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9125375Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9125380Z 2025-12-04T12:12:57.9125589Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9126531Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9126552Z 2025-12-04T12:12:57.9126811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9127022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9127144Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9127255Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9127590Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9127814Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9127909Z graph_break [] 2025-12-04T12:12:57.9128130Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9128852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9128953Z warnings.warn( 2025-12-04T12:12:57.9129181Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9129288Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9129400Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9129625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9129953Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9130064Z graph_break [] 2025-12-04T12:12:57.9130272Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9130983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9131091Z warnings.warn( 2025-12-04T12:12:57.9131284Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9131843Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9132002Z Traceback (most recent call last): 2025-12-04T12:12:57.9132462Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9132665Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9132874Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9132878Z 2025-12-04T12:12:57.9133089Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9134037Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9134073Z 2025-12-04T12:12:57.9134337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9134561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9134674Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9134788Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9135133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9135349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9135477Z graph_break [] 2025-12-04T12:12:57.9135699Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9136415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9136528Z warnings.warn( 2025-12-04T12:12:57.9136739Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9136852Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9136981Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9137202Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9137533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9137644Z graph_break [] 2025-12-04T12:12:57.9137855Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9138587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9138686Z warnings.warn( 2025-12-04T12:12:57.9138897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9139021Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9139134Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9139355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9139698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9139798Z graph_break [] 2025-12-04T12:12:57.9140023Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9140734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9140835Z warnings.warn( 2025-12-04T12:12:57.9141649Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml - 2025-12-04T12:12:57.9141817Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9142936Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9142969Z 2025-12-04T12:12:57.9143183Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9144116Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9144138Z 2025-12-04T12:12:57.9144400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9144578Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9144810Z ============ 1 failed, 3 skipped, 143 deselected, 2 rerun in 4.91s ============= 2025-12-04T12:12:57.9144911Z Got exit code 1 2025-12-04T12:12:57.9145017Z Retrying single test... 2025-12-04T12:12:57.9145693Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml 2025-12-04T12:12:57.9145854Z ============================= test session starts ============================== 2025-12-04T12:12:57.9146209Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9146317Z cachedir: .pytest_cache 2025-12-04T12:12:57.9146827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9146995Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9147097Z configfile: pytest.ini 2025-12-04T12:12:57.9147671Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9147912Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9148934Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9149065Z Running 1 items in this shard 2025-12-04T12:12:57.9149070Z 2025-12-04T12:12:57.9149968Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5536s] [100%] 2025-12-04T12:12:57.9150869Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1586s] [100%] 2025-12-04T12:12:57.9151702Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [100%] 2025-12-04T12:12:57.9151708Z 2025-12-04T12:12:57.9151845Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9152414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9152530Z Traceback (most recent call last): 2025-12-04T12:12:57.9153002Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9153196Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9153401Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9153406Z 2025-12-04T12:12:57.9153624Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9154596Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9154602Z 2025-12-04T12:12:57.9154877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9155118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9155226Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9155352Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9155682Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9155896Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9156005Z graph_break [] 2025-12-04T12:12:57.9156218Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9156952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9157081Z warnings.warn( 2025-12-04T12:12:57.9157640Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9157776Z Traceback (most recent call last): 2025-12-04T12:12:57.9158235Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9158430Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9158650Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9158685Z 2025-12-04T12:12:57.9158895Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9159837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9159844Z 2025-12-04T12:12:57.9160105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9160332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9160442Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9160556Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9160898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9161110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9161207Z graph_break [] 2025-12-04T12:12:57.9161428Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9162215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9162332Z warnings.warn( 2025-12-04T12:12:57.9162544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9162654Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9162779Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9162993Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9163323Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9163432Z graph_break [] 2025-12-04T12:12:57.9163643Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9164355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9164466Z warnings.warn( 2025-12-04T12:12:57.9164606Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9165177Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9165297Z Traceback (most recent call last): 2025-12-04T12:12:57.9165793Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9166041Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9166250Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9166254Z 2025-12-04T12:12:57.9166475Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9167406Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9167411Z 2025-12-04T12:12:57.9167674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9167898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9168039Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9168175Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9168505Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9168720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9168829Z graph_break [] 2025-12-04T12:12:57.9169036Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9169751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9169893Z warnings.warn( 2025-12-04T12:12:57.9170100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9170220Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9170329Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9170544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9170886Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9170982Z graph_break [] 2025-12-04T12:12:57.9171190Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9171907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9172005Z warnings.warn( 2025-12-04T12:12:57.9172223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9172331Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9172441Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9172666Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9172994Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9173089Z graph_break [] 2025-12-04T12:12:57.9173312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9174023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9174132Z warnings.warn( 2025-12-04T12:12:57.9174931Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml - 2025-12-04T12:12:57.9175097Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9176171Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9176177Z 2025-12-04T12:12:57.9176427Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9177376Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9177412Z 2025-12-04T12:12:57.9177672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9177847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9178056Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.9178156Z Got exit code 1 2025-12-04T12:12:57.9178275Z Retrying single test... 2025-12-04T12:12:57.9178901Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml 2025-12-04T12:12:57.9179092Z ============================= test session starts ============================== 2025-12-04T12:12:57.9179445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9179551Z cachedir: .pytest_cache 2025-12-04T12:12:57.9180058Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9180193Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9180297Z configfile: pytest.ini 2025-12-04T12:12:57.9180887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9181140Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9182160Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9182287Z Running 1 items in this shard 2025-12-04T12:12:57.9182292Z 2025-12-04T12:12:57.9183189Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5710s] [100%] 2025-12-04T12:12:57.9184092Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1658s] [100%] 2025-12-04T12:12:57.9184908Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1594s] [100%] 2025-12-04T12:12:57.9184914Z 2025-12-04T12:12:57.9185064Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9185625Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9185746Z Traceback (most recent call last): 2025-12-04T12:12:57.9186222Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9186418Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9186620Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9186639Z 2025-12-04T12:12:57.9186851Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9187782Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9187788Z 2025-12-04T12:12:57.9188057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9188314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9188439Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9188551Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9188916Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9189142Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9189238Z graph_break [] 2025-12-04T12:12:57.9189447Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9190180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9190278Z warnings.warn( 2025-12-04T12:12:57.9190850Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9191000Z Traceback (most recent call last): 2025-12-04T12:12:57.9191462Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9191669Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9191876Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9191881Z 2025-12-04T12:12:57.9192089Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9193037Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9193080Z 2025-12-04T12:12:57.9193338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9193560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9193669Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9193781Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9194128Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9194343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9194450Z graph_break [] 2025-12-04T12:12:57.9194658Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9195374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9195495Z warnings.warn( 2025-12-04T12:12:57.9195703Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9195810Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9195937Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9196152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9196498Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9196595Z graph_break [] 2025-12-04T12:12:57.9196809Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9197531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9197629Z warnings.warn( 2025-12-04T12:12:57.9197772Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9198344Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9198464Z Traceback (most recent call last): 2025-12-04T12:12:57.9198940Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9199142Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9199384Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9199416Z 2025-12-04T12:12:57.9199644Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9200581Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9200589Z 2025-12-04T12:12:57.9201046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9201258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9201371Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9201499Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9201834Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9202177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9202295Z graph_break [] 2025-12-04T12:12:57.9202506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9203243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9203343Z warnings.warn( 2025-12-04T12:12:57.9203552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9203724Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9203838Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9204051Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9204398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9204494Z graph_break [] 2025-12-04T12:12:57.9204723Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9205432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9205531Z warnings.warn( 2025-12-04T12:12:57.9205748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9205856Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9205967Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9206197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9206522Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9206633Z graph_break [] 2025-12-04T12:12:57.9206838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9207547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9207655Z warnings.warn( 2025-12-04T12:12:57.9208460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml - 2025-12-04T12:12:57.9208642Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9209709Z FAILED [0.1594s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9209717Z 2025-12-04T12:12:57.9209932Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9210942Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9210948Z 2025-12-04T12:12:57.9211211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9211437Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9211630Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.9211726Z Got exit code 1 2025-12-04T12:12:57.9212590Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9212989Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9213627Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml 2025-12-04T12:12:57.9213819Z ============================= test session starts ============================== 2025-12-04T12:12:57.9214164Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9214284Z cachedir: .pytest_cache 2025-12-04T12:12:57.9214794Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9214932Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9215070Z configfile: pytest.ini 2025-12-04T12:12:57.9215643Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9215880Z collecting ... collected 380 items / 147 deselected / 233 selected 2025-12-04T12:12:57.9216024Z stepcurrent: skipping 147 already run items. 2025-12-04T12:12:57.9216141Z Running 28 items in this shard 2025-12-04T12:12:57.9216145Z 2025-12-04T12:12:57.9217174Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 3%] 2025-12-04T12:12:57.9218077Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6359s] [ 7%] 2025-12-04T12:12:57.9218991Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1665s] [ 7%] 2025-12-04T12:12:57.9219807Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [ 7%] 2025-12-04T12:12:57.9219812Z 2025-12-04T12:12:57.9219967Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9220525Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9220647Z Traceback (most recent call last): 2025-12-04T12:12:57.9221120Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9221314Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9221538Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9221543Z 2025-12-04T12:12:57.9221752Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9222683Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9222688Z 2025-12-04T12:12:57.9222992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9223208Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9223359Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9223472Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9223803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9224031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9224128Z graph_break [] 2025-12-04T12:12:57.9224337Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9225071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9225168Z warnings.warn( 2025-12-04T12:12:57.9225770Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9225890Z Traceback (most recent call last): 2025-12-04T12:12:57.9226349Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9226554Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9226762Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9226766Z 2025-12-04T12:12:57.9227016Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9227946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9227952Z 2025-12-04T12:12:57.9228210Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9228439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9228548Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9228661Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9229004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9229217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9229326Z graph_break [] 2025-12-04T12:12:57.9229535Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9230257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9230368Z warnings.warn( 2025-12-04T12:12:57.9230576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9230683Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9230809Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9231027Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9231368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9231466Z graph_break [] 2025-12-04T12:12:57.9231673Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9232400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9232500Z warnings.warn( 2025-12-04T12:12:57.9232640Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9233210Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9233329Z Traceback (most recent call last): 2025-12-04T12:12:57.9233838Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9234040Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9234274Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9234279Z 2025-12-04T12:12:57.9234498Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9235435Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9235443Z 2025-12-04T12:12:57.9235713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9235924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9236033Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9236194Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9236528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9236758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9236860Z graph_break [] 2025-12-04T12:12:57.9237068Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9237798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9237928Z warnings.warn( 2025-12-04T12:12:57.9238136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9238254Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9238364Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9238589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9238918Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9239015Z graph_break [] 2025-12-04T12:12:57.9239235Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9239947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9240045Z warnings.warn( 2025-12-04T12:12:57.9240264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9240374Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9240496Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9240708Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9241035Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9241141Z graph_break [] 2025-12-04T12:12:57.9241350Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9242059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9242242Z warnings.warn( 2025-12-04T12:12:57.9243045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml - 2025-12-04T12:12:57.9243224Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9244298Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9244303Z 2025-12-04T12:12:57.9244516Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9245508Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9245542Z 2025-12-04T12:12:57.9245805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9245992Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9246209Z ============ 1 failed, 1 skipped, 147 deselected, 2 rerun in 5.02s ============= 2025-12-04T12:12:57.9246309Z Got exit code 1 2025-12-04T12:12:57.9246430Z Retrying single test... 2025-12-04T12:12:57.9247056Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml 2025-12-04T12:12:57.9247231Z ============================= test session starts ============================== 2025-12-04T12:12:57.9247606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9247715Z cachedir: .pytest_cache 2025-12-04T12:12:57.9248239Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9248364Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9248470Z configfile: pytest.ini 2025-12-04T12:12:57.9249056Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9249326Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9250356Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9250469Z Running 1 items in this shard 2025-12-04T12:12:57.9250474Z 2025-12-04T12:12:57.9251375Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6463s] [100%] 2025-12-04T12:12:57.9252285Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1661s] [100%] 2025-12-04T12:12:57.9253105Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1613s] [100%] 2025-12-04T12:12:57.9253113Z 2025-12-04T12:12:57.9253264Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9253821Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9253957Z Traceback (most recent call last): 2025-12-04T12:12:57.9254419Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9254616Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9254843Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9254847Z 2025-12-04T12:12:57.9255055Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9255997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9256002Z 2025-12-04T12:12:57.9256262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9256474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9256599Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9256742Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9257076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9257337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9257436Z graph_break [] 2025-12-04T12:12:57.9257658Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9258379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9258482Z warnings.warn( 2025-12-04T12:12:57.9259054Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9259173Z Traceback (most recent call last): 2025-12-04T12:12:57.9259682Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9259877Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9260083Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9260088Z 2025-12-04T12:12:57.9260316Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9261248Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9261283Z 2025-12-04T12:12:57.9261556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9261767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9261878Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9262009Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9262342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9262558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9262675Z graph_break [] 2025-12-04T12:12:57.9262886Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9263616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9263717Z warnings.warn( 2025-12-04T12:12:57.9263925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9264049Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9264162Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9264372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9264715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9264813Z graph_break [] 2025-12-04T12:12:57.9265038Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9265751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9265850Z warnings.warn( 2025-12-04T12:12:57.9266004Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9266562Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9266679Z Traceback (most recent call last): 2025-12-04T12:12:57.9267154Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9267346Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9267569Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9267617Z 2025-12-04T12:12:57.9267829Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9268793Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9268811Z 2025-12-04T12:12:57.9269067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9269281Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9269402Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9269513Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9269841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9270063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9270185Z graph_break [] 2025-12-04T12:12:57.9270399Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9271127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9271223Z warnings.warn( 2025-12-04T12:12:57.9271442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9271580Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9271692Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9271915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9272242Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9272336Z graph_break [] 2025-12-04T12:12:57.9272557Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9273270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9273384Z warnings.warn( 2025-12-04T12:12:57.9273591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9273700Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9273823Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9274035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9274364Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9274473Z graph_break [] 2025-12-04T12:12:57.9274680Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9275404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9275503Z warnings.warn( 2025-12-04T12:12:57.9276301Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml - 2025-12-04T12:12:57.9276485Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9277543Z FAILED [0.1613s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9277551Z 2025-12-04T12:12:57.9277774Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9278706Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9278714Z 2025-12-04T12:12:57.9279020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9279197Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9279421Z ================== 1 failed, 174 deselected, 2 rerun in 5.03s ================== 2025-12-04T12:12:57.9279532Z Got exit code 1 2025-12-04T12:12:57.9279633Z Retrying single test... 2025-12-04T12:12:57.9280255Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml 2025-12-04T12:12:57.9280426Z ============================= test session starts ============================== 2025-12-04T12:12:57.9280767Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9280886Z cachedir: .pytest_cache 2025-12-04T12:12:57.9281421Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9281545Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9281662Z configfile: pytest.ini 2025-12-04T12:12:57.9282312Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9282537Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9283573Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9283730Z Running 1 items in this shard 2025-12-04T12:12:57.9283735Z 2025-12-04T12:12:57.9284655Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5530s] [100%] 2025-12-04T12:12:57.9285551Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1618s] [100%] 2025-12-04T12:12:57.9286387Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1573s] [100%] 2025-12-04T12:12:57.9286393Z 2025-12-04T12:12:57.9286534Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9287088Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9287223Z Traceback (most recent call last): 2025-12-04T12:12:57.9287685Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9287895Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9288105Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9288112Z 2025-12-04T12:12:57.9288322Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9289268Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9289276Z 2025-12-04T12:12:57.9289536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9289764Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9289876Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9289989Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9290334Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9290584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9290679Z graph_break [] 2025-12-04T12:12:57.9290902Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9291652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9291763Z warnings.warn( 2025-12-04T12:12:57.9292326Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9292446Z Traceback (most recent call last): 2025-12-04T12:12:57.9292918Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9293109Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9293357Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9293376Z 2025-12-04T12:12:57.9293586Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9294521Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9294528Z 2025-12-04T12:12:57.9294799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9295115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9295239Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9295351Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9295764Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9295996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9296092Z graph_break [] 2025-12-04T12:12:57.9296308Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9297038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9297138Z warnings.warn( 2025-12-04T12:12:57.9297363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9297470Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9297583Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9297812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9298139Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9298234Z graph_break [] 2025-12-04T12:12:57.9298455Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9299170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9299279Z warnings.warn( 2025-12-04T12:12:57.9299420Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9299978Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9300111Z Traceback (most recent call last): 2025-12-04T12:12:57.9300569Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9300762Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9301148Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9301154Z 2025-12-04T12:12:57.9301361Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9302386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9302430Z 2025-12-04T12:12:57.9302688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9302898Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9303020Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9303130Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9303473Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9303684Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9303777Z graph_break [] 2025-12-04T12:12:57.9304001Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9304764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9304864Z warnings.warn( 2025-12-04T12:12:57.9305088Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9305198Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9305323Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9305538Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9305873Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9306022Z graph_break [] 2025-12-04T12:12:57.9306231Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9306945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9307056Z warnings.warn( 2025-12-04T12:12:57.9307265Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9307386Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9307499Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9307713Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9308052Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9308147Z graph_break [] 2025-12-04T12:12:57.9308351Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9309073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9309170Z warnings.warn( 2025-12-04T12:12:57.9309982Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml - 2025-12-04T12:12:57.9310151Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9311207Z FAILED [0.1573s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9311214Z 2025-12-04T12:12:57.9311438Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9312374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9312379Z 2025-12-04T12:12:57.9312650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9312824Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9313062Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:57.9313182Z Got exit code 1 2025-12-04T12:12:57.9314067Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9314479Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9315109Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml 2025-12-04T12:12:57.9316081Z ============================= test session starts ============================== 2025-12-04T12:12:57.9316740Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9317320Z cachedir: .pytest_cache 2025-12-04T12:12:57.9318055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9318833Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9319168Z configfile: pytest.ini 2025-12-04T12:12:57.9319932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9320884Z collecting ... collected 380 items / 149 deselected / 231 selected 2025-12-04T12:12:57.9321422Z stepcurrent: skipping 149 already run items. 2025-12-04T12:12:57.9321802Z Running 26 items in this shard 2025-12-04T12:12:57.9322022Z 2025-12-04T12:12:57.9323122Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 3%] 2025-12-04T12:12:57.9325351Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 7%] 2025-12-04T12:12:57.9327513Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 11%] 2025-12-04T12:12:57.9329653Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 15%] 2025-12-04T12:12:57.9331683Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5405s] [ 19%] 2025-12-04T12:12:57.9333622Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1566s] [ 19%] 2025-12-04T12:12:57.9335474Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1564s] [ 19%] 2025-12-04T12:12:57.9336440Z 2025-12-04T12:12:57.9336579Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9337425Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9338228Z Traceback (most recent call last): 2025-12-04T12:12:57.9338931Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9339721Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9340249Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9340633Z 2025-12-04T12:12:57.9340844Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9342154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9343214Z 2025-12-04T12:12:57.9343488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9344112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9344565Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9344897Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9345443Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9346118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9346598Z graph_break [] 2025-12-04T12:12:57.9346970Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9348035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9348988Z warnings.warn( 2025-12-04T12:12:57.9349711Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9350554Z Traceback (most recent call last): 2025-12-04T12:12:57.9351234Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9352027Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9352559Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9352892Z 2025-12-04T12:12:57.9353119Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9354386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9355459Z 2025-12-04T12:12:57.9355718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9356335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9356799Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9357115Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9357655Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9358343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9358777Z graph_break [] 2025-12-04T12:12:57.9359141Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9360217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9361172Z warnings.warn( 2025-12-04T12:12:57.9361533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9361999Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9362402Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9362821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9363511Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9364077Z graph_break [] 2025-12-04T12:12:57.9364431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9365504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9366487Z warnings.warn( 2025-12-04T12:12:57.9366791Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9367670Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9368483Z Traceback (most recent call last): 2025-12-04T12:12:57.9369179Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9376308Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9376873Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9377216Z 2025-12-04T12:12:57.9377432Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9378826Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9379909Z 2025-12-04T12:12:57.9380181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9380808Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9381269Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9381608Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9382159Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9382899Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9383342Z graph_break [] 2025-12-04T12:12:57.9383711Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9384795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9385734Z warnings.warn( 2025-12-04T12:12:57.9386118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9386580Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9386894Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9387319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9388004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9388574Z graph_break [] 2025-12-04T12:12:57.9388926Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9389997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9390947Z warnings.warn( 2025-12-04T12:12:57.9391306Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9391777Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9392109Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9392542Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9393211Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9393779Z graph_break [] 2025-12-04T12:12:57.9394139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9395193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9396135Z warnings.warn( 2025-12-04T12:12:57.9397092Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml - 2025-12-04T12:12:57.9398195Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9399615Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9401094Z 2025-12-04T12:12:57.9401316Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9402713Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9403782Z 2025-12-04T12:12:57.9404063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9404649Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9405241Z ============ 1 failed, 4 skipped, 149 deselected, 2 rerun in 4.92s ============= 2025-12-04T12:12:57.9405711Z Got exit code 1 2025-12-04T12:12:57.9405982Z Retrying single test... 2025-12-04T12:12:57.9406783Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml 2025-12-04T12:12:57.9407722Z ============================= test session starts ============================== 2025-12-04T12:12:57.9408375Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9409009Z cachedir: .pytest_cache 2025-12-04T12:12:57.9409693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9410460Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9410813Z configfile: pytest.ini 2025-12-04T12:12:57.9411560Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9412496Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9413873Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9415142Z Running 1 items in this shard 2025-12-04T12:12:57.9415348Z 2025-12-04T12:12:57.9416255Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5673s] [100%] 2025-12-04T12:12:57.9418190Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [100%] 2025-12-04T12:12:57.9420046Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1578s] [100%] 2025-12-04T12:12:57.9421011Z 2025-12-04T12:12:57.9421149Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9421989Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9422790Z Traceback (most recent call last): 2025-12-04T12:12:57.9423490Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9424279Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9424813Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9425145Z 2025-12-04T12:12:57.9425358Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9426692Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9427805Z 2025-12-04T12:12:57.9428078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9428695Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9429147Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9429483Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9430032Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9430716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9431163Z graph_break [] 2025-12-04T12:12:57.9431527Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9432642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9433584Z warnings.warn( 2025-12-04T12:12:57.9434309Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9435125Z Traceback (most recent call last): 2025-12-04T12:12:57.9435805Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9436633Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9437175Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9437506Z 2025-12-04T12:12:57.9437730Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9439003Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9440076Z 2025-12-04T12:12:57.9440339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9440953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9441422Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9441737Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9442350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9443052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9443494Z graph_break [] 2025-12-04T12:12:57.9443867Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9444941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9445907Z warnings.warn( 2025-12-04T12:12:57.9446267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9446732Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9447062Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9447480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9448169Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9448736Z graph_break [] 2025-12-04T12:12:57.9449105Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9450163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9451108Z warnings.warn( 2025-12-04T12:12:57.9451416Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9452296Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9453118Z Traceback (most recent call last): 2025-12-04T12:12:57.9453847Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9454642Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9455170Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9455517Z 2025-12-04T12:12:57.9455727Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9457002Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9458062Z 2025-12-04T12:12:57.9458336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9458971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9459434Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9459769Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9460298Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9460994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9461440Z graph_break [] 2025-12-04T12:12:57.9461842Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9462899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9463846Z warnings.warn( 2025-12-04T12:12:57.9464221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9464669Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9465002Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9465435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9466113Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9466671Z graph_break [] 2025-12-04T12:12:57.9467032Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9468094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9469031Z warnings.warn( 2025-12-04T12:12:57.9469401Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9469862Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9470189Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9470607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9471294Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9471861Z graph_break [] 2025-12-04T12:12:57.9472212Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9473271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9474220Z warnings.warn( 2025-12-04T12:12:57.9475182Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml - 2025-12-04T12:12:57.9476265Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9477698Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9478914Z 2025-12-04T12:12:57.9479129Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9480446Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9481511Z 2025-12-04T12:12:57.9481791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9482435Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9482951Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:57.9483391Z Got exit code 1 2025-12-04T12:12:57.9483644Z Retrying single test... 2025-12-04T12:12:57.9484495Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml 2025-12-04T12:12:57.9485422Z ============================= test session starts ============================== 2025-12-04T12:12:57.9486069Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9486646Z cachedir: .pytest_cache 2025-12-04T12:12:57.9487335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9488135Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9488466Z configfile: pytest.ini 2025-12-04T12:12:57.9489231Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9490160Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9491546Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9492800Z Running 1 items in this shard 2025-12-04T12:12:57.9493012Z 2025-12-04T12:12:57.9493918Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5346s] [100%] 2025-12-04T12:12:57.9495851Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1571s] [100%] 2025-12-04T12:12:57.9497697Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1564s] [100%] 2025-12-04T12:12:57.9498639Z 2025-12-04T12:12:57.9498795Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9499623Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9500437Z Traceback (most recent call last): 2025-12-04T12:12:57.9501421Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9502220Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9502754Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9503103Z 2025-12-04T12:12:57.9503313Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9504591Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9505652Z 2025-12-04T12:12:57.9506014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9506626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9507137Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9507466Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9508002Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9508689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9509144Z graph_break [] 2025-12-04T12:12:57.9509496Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9510568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9511516Z warnings.warn( 2025-12-04T12:12:57.9512275Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9513080Z Traceback (most recent call last): 2025-12-04T12:12:57.9513777Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9514565Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9515104Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9515438Z 2025-12-04T12:12:57.9515690Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9516964Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9518040Z 2025-12-04T12:12:57.9518303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9518919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9519374Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9519701Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9520255Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9520937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9521371Z graph_break [] 2025-12-04T12:12:57.9521738Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9522873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9523809Z warnings.warn( 2025-12-04T12:12:57.9524182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9524643Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9524972Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9525382Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9526064Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9526634Z graph_break [] 2025-12-04T12:12:57.9526981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9528039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9528983Z warnings.warn( 2025-12-04T12:12:57.9529283Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9530112Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:57.9530925Z Traceback (most recent call last): 2025-12-04T12:12:57.9531660Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9532438Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9533007Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9533349Z 2025-12-04T12:12:57.9533556Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9534832Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9535899Z 2025-12-04T12:12:57.9536158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9536763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9537221Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9537547Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9538111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9538796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9539240Z graph_break [] 2025-12-04T12:12:57.9539588Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9540651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9541632Z warnings.warn( 2025-12-04T12:12:57.9542001Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9542445Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9542765Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9543191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9543866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9544429Z graph_break [] 2025-12-04T12:12:57.9544782Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9545843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9546770Z warnings.warn( 2025-12-04T12:12:57.9547136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9547594Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9547913Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9548337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9549015Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9549575Z graph_break [] 2025-12-04T12:12:57.9549925Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9551000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9551943Z warnings.warn( 2025-12-04T12:12:57.9552888Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml - 2025-12-04T12:12:57.9553989Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9555366Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9556560Z 2025-12-04T12:12:57.9556782Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9558098Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9559189Z 2025-12-04T12:12:57.9559446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9560026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9560536Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.9560956Z Got exit code 1 2025-12-04T12:12:57.9561968Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:57.9563426Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9564634Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml 2025-12-04T12:12:57.9565537Z ============================= test session starts ============================== 2025-12-04T12:12:57.9566186Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9566771Z cachedir: .pytest_cache 2025-12-04T12:12:57.9567460Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9568257Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9568598Z configfile: pytest.ini 2025-12-04T12:12:57.9569352Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9570283Z collecting ... collected 380 items / 154 deselected / 226 selected 2025-12-04T12:12:57.9570768Z stepcurrent: skipping 154 already run items. 2025-12-04T12:12:57.9571148Z Running 21 items in this shard 2025-12-04T12:12:57.9571354Z 2025-12-04T12:12:57.9572382Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [ 4%] 2025-12-04T12:12:57.9574441Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5479s] [ 9%] 2025-12-04T12:12:57.9576349Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1578s] [ 9%] 2025-12-04T12:12:57.9578201Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [ 9%] 2025-12-04T12:12:57.9579171Z 2025-12-04T12:12:57.9579307Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9580146Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9580945Z Traceback (most recent call last): 2025-12-04T12:12:57.9581636Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9582427Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9582956Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9583283Z 2025-12-04T12:12:57.9583488Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9584802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9585869Z 2025-12-04T12:12:57.9586125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9586766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9587216Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9587535Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9588079Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9588752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9589195Z graph_break [] 2025-12-04T12:12:57.9589556Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9590663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9591597Z warnings.warn( 2025-12-04T12:12:57.9592323Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9593134Z Traceback (most recent call last): 2025-12-04T12:12:57.9593830Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9594609Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9595179Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9595507Z 2025-12-04T12:12:57.9595732Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9596997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9598068Z 2025-12-04T12:12:57.9598329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9598941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9599409Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9599724Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9600263Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9601116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9601570Z graph_break [] 2025-12-04T12:12:57.9601921Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9603053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9604002Z warnings.warn( 2025-12-04T12:12:57.9604367Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9604824Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9605154Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9605569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9606251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9606804Z graph_break [] 2025-12-04T12:12:57.9607160Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9608210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9609154Z warnings.warn( 2025-12-04T12:12:57.9609453Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9610297Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9611169Z Traceback (most recent call last): 2025-12-04T12:12:57.9611868Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9612693Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9613208Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9613555Z 2025-12-04T12:12:57.9613763Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9615038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9616093Z 2025-12-04T12:12:57.9616363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9616995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9617461Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9617781Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9618317Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9618984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9619426Z graph_break [] 2025-12-04T12:12:57.9619786Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9620880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9621822Z warnings.warn( 2025-12-04T12:12:57.9622185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9622644Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9622962Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9623389Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9624069Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9624623Z graph_break [] 2025-12-04T12:12:57.9624982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9626045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9626983Z warnings.warn( 2025-12-04T12:12:57.9627343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9627802Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9628127Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9628537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9629210Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9629772Z graph_break [] 2025-12-04T12:12:57.9630120Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9631182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9632122Z warnings.warn( 2025-12-04T12:12:57.9633076Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml - 2025-12-04T12:12:57.9634164Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9635538Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9636747Z 2025-12-04T12:12:57.9636997Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9638276Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9639364Z 2025-12-04T12:12:57.9639637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9640195Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9640722Z ============ 1 failed, 1 skipped, 154 deselected, 2 rerun in 4.92s ============= 2025-12-04T12:12:57.9641174Z Got exit code 1 2025-12-04T12:12:57.9641428Z Retrying single test... 2025-12-04T12:12:57.9642305Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml 2025-12-04T12:12:57.9643271Z ============================= test session starts ============================== 2025-12-04T12:12:57.9643921Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9644496Z cachedir: .pytest_cache 2025-12-04T12:12:57.9645177Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9645942Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9646284Z configfile: pytest.ini 2025-12-04T12:12:57.9647110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9648046Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9649424Z stepcurrent: skipping 155 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9650685Z Running 1 items in this shard 2025-12-04T12:12:57.9650901Z 2025-12-04T12:12:57.9651800Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5961s] [100%] 2025-12-04T12:12:57.9653734Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1653s] [100%] 2025-12-04T12:12:57.9655578Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [100%] 2025-12-04T12:12:57.9656528Z 2025-12-04T12:12:57.9656675Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9657512Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9658322Z Traceback (most recent call last): 2025-12-04T12:12:57.9659026Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9659820Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9660350Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9660696Z 2025-12-04T12:12:57.9660910Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9662203Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9663268Z 2025-12-04T12:12:57.9663547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9664201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9664669Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9665035Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9665570Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9666261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9666721Z graph_break [] 2025-12-04T12:12:57.9667092Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9668157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9669115Z warnings.warn( 2025-12-04T12:12:57.9669833Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9670707Z Traceback (most recent call last): 2025-12-04T12:12:57.9671394Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9672187Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9672724Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9673058Z 2025-12-04T12:12:57.9673270Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9674719Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9675789Z 2025-12-04T12:12:57.9676050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9676668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9677122Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9677455Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9678000Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9678689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9679124Z graph_break [] 2025-12-04T12:12:57.9679489Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9680559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9681496Z warnings.warn( 2025-12-04T12:12:57.9681870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9682404Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9682730Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9683146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9683831Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9684400Z graph_break [] 2025-12-04T12:12:57.9684750Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9685819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9686770Z warnings.warn( 2025-12-04T12:12:57.9687071Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9687909Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9688731Z Traceback (most recent call last): 2025-12-04T12:12:57.9689425Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9690273Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9690820Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9691202Z 2025-12-04T12:12:57.9691415Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9692702Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9693769Z 2025-12-04T12:12:57.9694027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9694639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9695102Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9695438Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9696094Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9696791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9697247Z graph_break [] 2025-12-04T12:12:57.9697604Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9698684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9699641Z warnings.warn( 2025-12-04T12:12:57.9700057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9700510Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9700998Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9701437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9701768Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9701863Z graph_break [] 2025-12-04T12:12:57.9702090Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9702802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9702917Z warnings.warn( 2025-12-04T12:12:57.9703131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9703237Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9703367Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9703582Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9703911Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9704019Z graph_break [] 2025-12-04T12:12:57.9704225Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9704949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9705047Z warnings.warn( 2025-12-04T12:12:57.9705848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml - 2025-12-04T12:12:57.9706024Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9707090Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9707099Z 2025-12-04T12:12:57.9707320Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9708325Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9708331Z 2025-12-04T12:12:57.9708631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9708820Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9709014Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ================== 2025-12-04T12:12:57.9709123Z Got exit code 1 2025-12-04T12:12:57.9709233Z Retrying single test... 2025-12-04T12:12:57.9709855Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml 2025-12-04T12:12:57.9710030Z ============================= test session starts ============================== 2025-12-04T12:12:57.9710370Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9710513Z cachedir: .pytest_cache 2025-12-04T12:12:57.9711037Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9711158Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9711275Z configfile: pytest.ini 2025-12-04T12:12:57.9711849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9712071Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9713146Z stepcurrent: skipping 155 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9713260Z Running 1 items in this shard 2025-12-04T12:12:57.9713265Z 2025-12-04T12:12:57.9714184Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5367s] [100%] 2025-12-04T12:12:57.9715083Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1580s] [100%] 2025-12-04T12:12:57.9715917Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1560s] [100%] 2025-12-04T12:12:57.9715925Z 2025-12-04T12:12:57.9716062Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9716620Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9716752Z Traceback (most recent call last): 2025-12-04T12:12:57.9717216Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9717424Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9717632Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9717637Z 2025-12-04T12:12:57.9717846Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9718797Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9718804Z 2025-12-04T12:12:57.9719063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9719287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9719399Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9719512Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9719905Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9720122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9720247Z graph_break [] 2025-12-04T12:12:57.9720471Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9721190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9721303Z warnings.warn( 2025-12-04T12:12:57.9721863Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9721981Z Traceback (most recent call last): 2025-12-04T12:12:57.9722522Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9722757Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9722971Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9722993Z 2025-12-04T12:12:57.9723201Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9724139Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9724178Z 2025-12-04T12:12:57.9724453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9724672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9724782Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9724913Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9725248Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9725478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9725575Z graph_break [] 2025-12-04T12:12:57.9725784Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9726515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9726616Z warnings.warn( 2025-12-04T12:12:57.9726826Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9726951Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9727066Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9727293Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9727622Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9727718Z graph_break [] 2025-12-04T12:12:57.9727942Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9728652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9728757Z warnings.warn( 2025-12-04T12:12:57.9728913Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9729475Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:57.9729615Z Traceback (most recent call last): 2025-12-04T12:12:57.9730078Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9730270Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9730496Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9730501Z 2025-12-04T12:12:57.9730714Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9731709Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9731764Z 2025-12-04T12:12:57.9732025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9732237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9732364Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9732479Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9732823Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9733037Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9733139Z graph_break [] 2025-12-04T12:12:57.9733398Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9734116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9734218Z warnings.warn( 2025-12-04T12:12:57.9734442Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9734552Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9734665Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9734927Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9735253Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9735362Z graph_break [] 2025-12-04T12:12:57.9735566Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9736280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9736391Z warnings.warn( 2025-12-04T12:12:57.9736597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9736717Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9736827Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9737040Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9737379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9737476Z graph_break [] 2025-12-04T12:12:57.9737682Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9738400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9738494Z warnings.warn( 2025-12-04T12:12:57.9739306Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml - 2025-12-04T12:12:57.9739477Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9740534Z FAILED [0.1560s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9740542Z 2025-12-04T12:12:57.9740764Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9741697Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9741703Z 2025-12-04T12:12:57.9741976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9742184Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9742379Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:57.9742525Z Got exit code 1 2025-12-04T12:12:57.9743374Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:57.9743788Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9744410Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml 2025-12-04T12:12:57.9744570Z ============================= test session starts ============================== 2025-12-04T12:12:57.9744958Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9745069Z cachedir: .pytest_cache 2025-12-04T12:12:57.9745588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9745711Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9745818Z configfile: pytest.ini 2025-12-04T12:12:57.9746404Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9746658Z collecting ... collected 380 items / 156 deselected / 224 selected 2025-12-04T12:12:57.9746800Z stepcurrent: skipping 156 already run items. 2025-12-04T12:12:57.9746923Z Running 19 items in this shard 2025-12-04T12:12:57.9746928Z 2025-12-04T12:12:57.9747948Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [ 5%] 2025-12-04T12:12:57.9748972Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 10%] 2025-12-04T12:12:57.9749966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0038s] (Skip non-critical tests to save resources.) [ 15%] 2025-12-04T12:12:57.9750864Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.6001s] [ 21%] 2025-12-04T12:12:57.9751751Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1675s] [ 21%] 2025-12-04T12:12:57.9752558Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1639s] [ 21%] 2025-12-04T12:12:57.9752580Z 2025-12-04T12:12:57.9752715Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9753261Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9753392Z Traceback (most recent call last): 2025-12-04T12:12:57.9753852Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9754047Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9754263Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9754268Z 2025-12-04T12:12:57.9754478Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9755467Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9755502Z 2025-12-04T12:12:57.9755763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9755989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9756101Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9756215Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9756558Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9756770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9756866Z graph_break [] 2025-12-04T12:12:57.9757120Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9759784Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9759946Z return x.grad, w.grad 2025-12-04T12:12:57.9760666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9760779Z warnings.warn( 2025-12-04T12:12:57.9763736Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9763851Z return x.grad, w.grad 2025-12-04T12:12:57.9764435Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9764561Z Traceback (most recent call last): 2025-12-04T12:12:57.9765051Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9765257Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9765473Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9765478Z 2025-12-04T12:12:57.9765716Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9766668Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9766674Z 2025-12-04T12:12:57.9766959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9767178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9767290Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9767419Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9767763Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9767998Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9768162Z graph_break [] 2025-12-04T12:12:57.9768380Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9771178Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9771291Z return x.grad, w.grad 2025-12-04T12:12:57.9772085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9772190Z warnings.warn( 2025-12-04T12:12:57.9774931Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9775077Z return x.grad, w.grad 2025-12-04T12:12:57.9775418Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9775539Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9775654Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9775877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9776219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9776317Z graph_break [] 2025-12-04T12:12:57.9776535Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9779174Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9779295Z return x.grad, w.grad 2025-12-04T12:12:57.9780009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9780110Z warnings.warn( 2025-12-04T12:12:57.9782748Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9782854Z return x.grad, w.grad 2025-12-04T12:12:57.9783007Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9783603Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9783750Z Traceback (most recent call last): 2025-12-04T12:12:57.9784218Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9784410Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9784631Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9784636Z 2025-12-04T12:12:57.9784844Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9785768Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9785786Z 2025-12-04T12:12:57.9786080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9786294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9786418Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9786532Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9786863Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9787088Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9787223Z graph_break [] 2025-12-04T12:12:57.9787436Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9790103Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9790214Z return x.grad, w.grad 2025-12-04T12:12:57.9790937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9791039Z warnings.warn( 2025-12-04T12:12:57.9793691Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9793795Z return x.grad, w.grad 2025-12-04T12:12:57.9794022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9794130Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9794242Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9794467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9794799Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9794892Z graph_break [] 2025-12-04T12:12:57.9795112Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9797785Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9797928Z return x.grad, w.grad 2025-12-04T12:12:57.9798639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9798750Z warnings.warn( 2025-12-04T12:12:57.9801658Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9801782Z return x.grad, w.grad 2025-12-04T12:12:57.9801996Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9802201Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9802329Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9802551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9802885Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9803010Z graph_break [] 2025-12-04T12:12:57.9803226Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9803957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9804062Z warnings.warn( 2025-12-04T12:12:57.9806693Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9806816Z return x.grad, w.grad 2025-12-04T12:12:57.9807616Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml - 2025-12-04T12:12:57.9807803Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9808871Z FAILED [0.1639s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9808879Z 2025-12-04T12:12:57.9809103Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9810031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9810037Z 2025-12-04T12:12:57.9810296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9810541Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9810760Z ============ 1 failed, 3 skipped, 156 deselected, 2 rerun in 5.00s ============= 2025-12-04T12:12:57.9810917Z Got exit code 1 2025-12-04T12:12:57.9811023Z Retrying single test... 2025-12-04T12:12:57.9811648Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml 2025-12-04T12:12:57.9811823Z ============================= test session starts ============================== 2025-12-04T12:12:57.9812166Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9812273Z cachedir: .pytest_cache 2025-12-04T12:12:57.9812795Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9812915Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9813085Z configfile: pytest.ini 2025-12-04T12:12:57.9813663Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9813888Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9814914Z stepcurrent: skipping 159 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9815062Z Running 1 items in this shard 2025-12-04T12:12:57.9815067Z 2025-12-04T12:12:57.9815970Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5689s] [100%] 2025-12-04T12:12:57.9816863Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1644s] [100%] 2025-12-04T12:12:57.9817685Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1624s] [100%] 2025-12-04T12:12:57.9817693Z 2025-12-04T12:12:57.9817829Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9818378Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9818513Z Traceback (most recent call last): 2025-12-04T12:12:57.9818972Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9819184Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9819388Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9819395Z 2025-12-04T12:12:57.9819605Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9820544Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9820552Z 2025-12-04T12:12:57.9820811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9821041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9821155Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9821267Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9821610Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9821824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9821919Z graph_break [] 2025-12-04T12:12:57.9822177Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9824822Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9824971Z return x.grad, w.grad 2025-12-04T12:12:57.9825688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9825798Z warnings.warn( 2025-12-04T12:12:57.9828448Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9828608Z return x.grad, w.grad 2025-12-04T12:12:57.9829151Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9829269Z Traceback (most recent call last): 2025-12-04T12:12:57.9829737Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9829932Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9830138Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9830145Z 2025-12-04T12:12:57.9830364Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9831287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9831295Z 2025-12-04T12:12:57.9831567Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9831777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9831887Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9832013Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9832346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9832576Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9832671Z graph_break [] 2025-12-04T12:12:57.9832881Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9835532Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9835637Z return x.grad, w.grad 2025-12-04T12:12:57.9836404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9836530Z warnings.warn( 2025-12-04T12:12:57.9839168Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9839273Z return x.grad, w.grad 2025-12-04T12:12:57.9839488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9839644Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9839764Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9839991Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9840326Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9840421Z graph_break [] 2025-12-04T12:12:57.9840642Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9843347Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9843507Z return x.grad, w.grad 2025-12-04T12:12:57.9844219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9844320Z warnings.warn( 2025-12-04T12:12:57.9846974Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9847081Z return x.grad, w.grad 2025-12-04T12:12:57.9847238Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9847787Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9847924Z Traceback (most recent call last): 2025-12-04T12:12:57.9848381Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9848576Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9848803Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9848808Z 2025-12-04T12:12:57.9849017Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9849942Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9849961Z 2025-12-04T12:12:57.9850254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9850493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9850617Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9850729Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9851058Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9851285Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9851380Z graph_break [] 2025-12-04T12:12:57.9851605Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9854274Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9854393Z return x.grad, w.grad 2025-12-04T12:12:57.9855107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9855239Z warnings.warn( 2025-12-04T12:12:57.9857887Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9857992Z return x.grad, w.grad 2025-12-04T12:12:57.9858215Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9858324Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9858435Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9858667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9858998Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9859093Z graph_break [] 2025-12-04T12:12:57.9859315Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9861956Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9862076Z return x.grad, w.grad 2025-12-04T12:12:57.9862787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9862898Z warnings.warn( 2025-12-04T12:12:57.9865562Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9865710Z return x.grad, w.grad 2025-12-04T12:12:57.9865921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9866030Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9866153Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9866370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9866698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9866805Z graph_break [] 2025-12-04T12:12:57.9867050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9867780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9867880Z warnings.warn( 2025-12-04T12:12:57.9870509Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9870655Z return x.grad, w.grad 2025-12-04T12:12:57.9871456Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml - 2025-12-04T12:12:57.9871638Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9872686Z FAILED [0.1624s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9872693Z 2025-12-04T12:12:57.9872916Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9873837Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9873842Z 2025-12-04T12:12:57.9874112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9874290Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9874486Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ================== 2025-12-04T12:12:57.9874595Z Got exit code 1 2025-12-04T12:12:57.9874699Z Retrying single test... 2025-12-04T12:12:57.9875323Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml 2025-12-04T12:12:57.9875495Z ============================= test session starts ============================== 2025-12-04T12:12:57.9875837Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9875957Z cachedir: .pytest_cache 2025-12-04T12:12:57.9876466Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9876587Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9876751Z configfile: pytest.ini 2025-12-04T12:12:57.9877331Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9877585Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:57.9878603Z stepcurrent: skipping 159 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9878718Z Running 1 items in this shard 2025-12-04T12:12:57.9878723Z 2025-12-04T12:12:57.9879615Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5787s] [100%] 2025-12-04T12:12:57.9880549Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1654s] [100%] 2025-12-04T12:12:57.9881372Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1650s] [100%] 2025-12-04T12:12:57.9881377Z 2025-12-04T12:12:57.9881519Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9882101Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9882296Z Traceback (most recent call last): 2025-12-04T12:12:57.9882755Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9882963Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9883171Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9883178Z 2025-12-04T12:12:57.9883386Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9884322Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9884328Z 2025-12-04T12:12:57.9884587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9884815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9884923Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9885034Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9885379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9885595Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9885693Z graph_break [] 2025-12-04T12:12:57.9885916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9888573Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9888693Z return x.grad, w.grad 2025-12-04T12:12:57.9889409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9889520Z warnings.warn( 2025-12-04T12:12:57.9892201Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9892349Z return x.grad, w.grad 2025-12-04T12:12:57.9892900Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9893020Z Traceback (most recent call last): 2025-12-04T12:12:57.9893524Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9893722Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9893935Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9893955Z 2025-12-04T12:12:57.9894167Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9895093Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9895134Z 2025-12-04T12:12:57.9895408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9895621Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9895730Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9895858Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9896194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9896425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9896525Z graph_break [] 2025-12-04T12:12:57.9896734Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9899390Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9899500Z return x.grad, w.grad 2025-12-04T12:12:57.9900234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9900336Z warnings.warn( 2025-12-04T12:12:57.9903185Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9903294Z return x.grad, w.grad 2025-12-04T12:12:57.9903510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9903707Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9903823Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9904178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9904512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9904609Z graph_break [] 2025-12-04T12:12:57.9904834Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9907517Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9907640Z return x.grad, w.grad 2025-12-04T12:12:57.9908359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9908460Z warnings.warn( 2025-12-04T12:12:57.9911109Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9911262Z return x.grad, w.grad 2025-12-04T12:12:57.9911422Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9911975Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _ 2025-12-04T12:12:57.9912109Z Traceback (most recent call last): 2025-12-04T12:12:57.9912566Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9912762Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9912989Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9912994Z 2025-12-04T12:12:57.9913205Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9914152Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9914158Z 2025-12-04T12:12:57.9914416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9914631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9914756Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9914869Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9915201Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9915429Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9915523Z graph_break [] 2025-12-04T12:12:57.9915744Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9918425Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9918570Z return x.grad, w.grad 2025-12-04T12:12:57.9919280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9919379Z warnings.warn( 2025-12-04T12:12:57.9922069Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9922227Z return x.grad, w.grad 2025-12-04T12:12:57.9922455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9922596Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9922708Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9922941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9923274Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9923388Z graph_break [] 2025-12-04T12:12:57.9923601Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9926238Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9926361Z return x.grad, w.grad 2025-12-04T12:12:57.9927080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9927192Z warnings.warn( 2025-12-04T12:12:57.9929831Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9929953Z return x.grad, w.grad 2025-12-04T12:12:57.9930166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9930277Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9930405Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9930625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9930965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9931061Z graph_break [] 2025-12-04T12:12:57.9931358Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9932136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9932234Z warnings.warn( 2025-12-04T12:12:57.9934861Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9934999Z return x.grad, w.grad 2025-12-04T12:12:57.9935808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml - 2025-12-04T12:12:57.9935992Z =========================== short test summary info ============================ 2025-12-04T12:12:57.9937044Z FAILED [0.1650s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9937081Z 2025-12-04T12:12:57.9937304Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9938231Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9938238Z 2025-12-04T12:12:57.9938513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9938687Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:57.9938885Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ================== 2025-12-04T12:12:57.9938992Z Got exit code 1 2025-12-04T12:12:57.9939829Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True 2025-12-04T12:12:57.9940229Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:57.9940869Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml 2025-12-04T12:12:57.9941028Z ============================= test session starts ============================== 2025-12-04T12:12:57.9941383Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:57.9941487Z cachedir: .pytest_cache 2025-12-04T12:12:57.9941995Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:57.9942127Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:57.9942234Z configfile: pytest.ini 2025-12-04T12:12:57.9942823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:57.9943046Z collecting ... collected 380 items / 160 deselected / 220 selected 2025-12-04T12:12:57.9943189Z stepcurrent: skipping 160 already run items. 2025-12-04T12:12:57.9943318Z Running 15 items in this shard 2025-12-04T12:12:57.9943323Z 2025-12-04T12:12:57.9944258Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5536s] [ 6%] 2025-12-04T12:12:57.9945161Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1655s] [ 6%] 2025-12-04T12:12:57.9946001Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1635s] [ 6%] 2025-12-04T12:12:57.9946009Z 2025-12-04T12:12:57.9946145Z ==================================== RERUNS ==================================== 2025-12-04T12:12:57.9946706Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.9946824Z Traceback (most recent call last): 2025-12-04T12:12:57.9947328Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9947524Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9947732Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9947737Z 2025-12-04T12:12:57.9947959Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9948883Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.9948922Z 2025-12-04T12:12:57.9949198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9949410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9949519Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9949643Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9949982Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9950194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9950306Z graph_break [] 2025-12-04T12:12:57.9950515Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9953163Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9953269Z return x.grad, w.grad 2025-12-04T12:12:57.9954004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9954105Z warnings.warn( 2025-12-04T12:12:57.9956750Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9956867Z return x.grad, w.grad 2025-12-04T12:12:57.9957414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.9957580Z Traceback (most recent call last): 2025-12-04T12:12:57.9958037Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9958274Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9958480Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9958485Z 2025-12-04T12:12:57.9958691Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9959635Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.9959642Z 2025-12-04T12:12:57.9959899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9960158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9960268Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9960379Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9960725Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9960939Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9961033Z graph_break [] 2025-12-04T12:12:57.9961259Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9963998Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9964118Z return x.grad, w.grad 2025-12-04T12:12:57.9964836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9964949Z warnings.warn( 2025-12-04T12:12:57.9967582Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9967702Z return x.grad, w.grad 2025-12-04T12:12:57.9967914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9968025Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9968150Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9968364Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9968695Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9968803Z graph_break [] 2025-12-04T12:12:57.9969013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9971704Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9971842Z return x.grad, w.grad 2025-12-04T12:12:57.9972554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9972666Z warnings.warn( 2025-12-04T12:12:57.9975318Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9975440Z return x.grad, w.grad 2025-12-04T12:12:57.9975582Z =================================== FAILURES =================================== 2025-12-04T12:12:57.9976144Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:57.9976296Z Traceback (most recent call last): 2025-12-04T12:12:57.9976751Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:57.9976961Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:57.9977169Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:57.9977175Z 2025-12-04T12:12:57.9977399Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:57.9978331Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:57.9978338Z 2025-12-04T12:12:57.9978595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:57.9978819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9978930Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9979055Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9979387Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9979598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9979704Z graph_break [] 2025-12-04T12:12:57.9979913Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9982559Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9982680Z return x.grad, w.grad 2025-12-04T12:12:57.9983393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9983504Z warnings.warn( 2025-12-04T12:12:57.9986187Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9986333Z return x.grad, w.grad 2025-12-04T12:12:57.9986552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9986659Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9986803Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9987019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9987398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9987497Z graph_break [] 2025-12-04T12:12:57.9987711Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9990371Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9990509Z return x.grad, w.grad 2025-12-04T12:12:57.9991243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9991341Z warnings.warn( 2025-12-04T12:12:57.9993982Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9994091Z return x.grad, w.grad 2025-12-04T12:12:57.9994302Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:57.9994425Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:57.9994539Z stats [('calls_captured', 10)] 2025-12-04T12:12:57.9994773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:57.9995104Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:57.9995204Z graph_break [] 2025-12-04T12:12:57.9995430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:57.9996142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:57.9996243Z warnings.warn( 2025-12-04T12:12:57.9998922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:57.9999059Z return x.grad, w.grad 2025-12-04T12:12:57.9999875Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml - 2025-12-04T12:12:58.0000046Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0001777Z FAILED [0.1635s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0001799Z 2025-12-04T12:12:58.0002035Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0003247Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0003257Z 2025-12-04T12:12:58.0003521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0003699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0003911Z ================== 1 failed, 160 deselected, 2 rerun in 4.94s ================== 2025-12-04T12:12:58.0004064Z Got exit code 1 2025-12-04T12:12:58.0004171Z Retrying single test... 2025-12-04T12:12:58.0004814Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml 2025-12-04T12:12:58.0004976Z ============================= test session starts ============================== 2025-12-04T12:12:58.0005337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0005444Z cachedir: .pytest_cache 2025-12-04T12:12:58.0005958Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0006095Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0006203Z configfile: pytest.ini 2025-12-04T12:12:58.0006777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0007020Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0008040Z stepcurrent: skipping 160 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0008169Z Running 1 items in this shard 2025-12-04T12:12:58.0008175Z 2025-12-04T12:12:58.0009067Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5728s] [100%] 2025-12-04T12:12:58.0009975Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1707s] [100%] 2025-12-04T12:12:58.0010785Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1641s] [100%] 2025-12-04T12:12:58.0010792Z 2025-12-04T12:12:58.0010928Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0011486Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0011604Z Traceback (most recent call last): 2025-12-04T12:12:58.0012139Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0012336Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0012582Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0012587Z 2025-12-04T12:12:58.0012809Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0013734Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0013742Z 2025-12-04T12:12:58.0014014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0014227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0014337Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0014489Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0014825Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0015056Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0015150Z graph_break [] 2025-12-04T12:12:58.0015358Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0018025Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0018161Z return x.grad, w.grad 2025-12-04T12:12:58.0018891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0018990Z warnings.warn( 2025-12-04T12:12:58.0021645Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0021749Z return x.grad, w.grad 2025-12-04T12:12:58.0022300Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0022432Z Traceback (most recent call last): 2025-12-04T12:12:58.0022888Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0023096Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0023304Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0023311Z 2025-12-04T12:12:58.0023519Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0024462Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0024467Z 2025-12-04T12:12:58.0024726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0024987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0025098Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0025237Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0025583Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0025795Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0025890Z graph_break [] 2025-12-04T12:12:58.0026113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0028794Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0028916Z return x.grad, w.grad 2025-12-04T12:12:58.0029627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0029736Z warnings.warn( 2025-12-04T12:12:58.0032399Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0032517Z return x.grad, w.grad 2025-12-04T12:12:58.0032730Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0032840Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0032965Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0033182Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0033512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0033622Z graph_break [] 2025-12-04T12:12:58.0033832Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0036476Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0036583Z return x.grad, w.grad 2025-12-04T12:12:58.0037315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0037416Z warnings.warn( 2025-12-04T12:12:58.0040158Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0040310Z return x.grad, w.grad 2025-12-04T12:12:58.0040455Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0041019Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0041140Z Traceback (most recent call last): 2025-12-04T12:12:58.0041600Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0041808Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0042017Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0042022Z 2025-12-04T12:12:58.0042344Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0043275Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0043283Z 2025-12-04T12:12:58.0043545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0043773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0043929Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0044057Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0044387Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0044600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0044709Z graph_break [] 2025-12-04T12:12:58.0044921Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0047596Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0047705Z return x.grad, w.grad 2025-12-04T12:12:58.0048418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0048530Z warnings.warn( 2025-12-04T12:12:58.0051165Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0051287Z return x.grad, w.grad 2025-12-04T12:12:58.0051500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0051622Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0051732Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0051951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0052325Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0052423Z graph_break [] 2025-12-04T12:12:58.0052634Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0055331Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0055435Z return x.grad, w.grad 2025-12-04T12:12:58.0056196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0056297Z warnings.warn( 2025-12-04T12:12:58.0058939Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0059075Z return x.grad, w.grad 2025-12-04T12:12:58.0059283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0059400Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0059514Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0059745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0060075Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0060171Z graph_break [] 2025-12-04T12:12:58.0060390Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0061106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0061206Z warnings.warn( 2025-12-04T12:12:58.0063852Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0063959Z return x.grad, w.grad 2025-12-04T12:12:58.0064772Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml - 2025-12-04T12:12:58.0064943Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0066006Z FAILED [0.1641s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0066011Z 2025-12-04T12:12:58.0066223Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0067191Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0067223Z 2025-12-04T12:12:58.0067485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0067660Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0067867Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ================== 2025-12-04T12:12:58.0067966Z Got exit code 1 2025-12-04T12:12:58.0068070Z Retrying single test... 2025-12-04T12:12:58.0068711Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml 2025-12-04T12:12:58.0068875Z ============================= test session starts ============================== 2025-12-04T12:12:58.0069264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0069372Z cachedir: .pytest_cache 2025-12-04T12:12:58.0069885Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0070021Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0070128Z configfile: pytest.ini 2025-12-04T12:12:58.0070715Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0070968Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0071980Z stepcurrent: skipping 160 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0072102Z Running 1 items in this shard 2025-12-04T12:12:58.0072109Z 2025-12-04T12:12:58.0072998Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.6072s] [100%] 2025-12-04T12:12:58.0073895Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1642s] [100%] 2025-12-04T12:12:58.0082054Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1616s] [100%] 2025-12-04T12:12:58.0082089Z 2025-12-04T12:12:58.0082433Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0083013Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0083142Z Traceback (most recent call last): 2025-12-04T12:12:58.0083634Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0083837Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0084044Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0084050Z 2025-12-04T12:12:58.0084276Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0085214Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0085219Z 2025-12-04T12:12:58.0085495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0085715Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0085831Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0086087Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0086425Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0086679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0086791Z graph_break [] 2025-12-04T12:12:58.0087006Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0089720Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0089829Z return x.grad, w.grad 2025-12-04T12:12:58.0090563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0090707Z warnings.warn( 2025-12-04T12:12:58.0093339Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0093501Z return x.grad, w.grad 2025-12-04T12:12:58.0094053Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0094187Z Traceback (most recent call last): 2025-12-04T12:12:58.0094641Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0094837Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0095049Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0095057Z 2025-12-04T12:12:58.0095269Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0096207Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0096213Z 2025-12-04T12:12:58.0096476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0096693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0096817Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0096928Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0097272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0097485Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0097581Z graph_break [] 2025-12-04T12:12:58.0097802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0100479Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0100630Z return x.grad, w.grad 2025-12-04T12:12:58.0101710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0101816Z warnings.warn( 2025-12-04T12:12:58.0104568Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0104676Z return x.grad, w.grad 2025-12-04T12:12:58.0104904Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0105011Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0105129Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0105365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0105748Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0105860Z graph_break [] 2025-12-04T12:12:58.0106071Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0108713Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0108830Z return x.grad, w.grad 2025-12-04T12:12:58.0109548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0109659Z warnings.warn( 2025-12-04T12:12:58.0112297Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0112415Z return x.grad, w.grad 2025-12-04T12:12:58.0112556Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0113105Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _ 2025-12-04T12:12:58.0113235Z Traceback (most recent call last): 2025-12-04T12:12:58.0113694Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0113897Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0114105Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0114166Z 2025-12-04T12:12:58.0114379Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0115362Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0115368Z 2025-12-04T12:12:58.0115629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0115862Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0115974Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0116087Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0116436Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0116655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0116800Z graph_break [] 2025-12-04T12:12:58.0117016Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0119671Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0119834Z return x.grad, w.grad 2025-12-04T12:12:58.0120547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0120662Z warnings.warn( 2025-12-04T12:12:58.0123362Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0123490Z return x.grad, w.grad 2025-12-04T12:12:58.0123701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0123812Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0123939Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0124158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0124492Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0124604Z graph_break [] 2025-12-04T12:12:58.0124814Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0127463Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0127568Z return x.grad, w.grad 2025-12-04T12:12:58.0128341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0128469Z warnings.warn( 2025-12-04T12:12:58.0131103Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0131221Z return x.grad, w.grad 2025-12-04T12:12:58.0131431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0131586Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0131700Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0131917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0132264Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0132360Z graph_break [] 2025-12-04T12:12:58.0132585Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0133300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0133431Z warnings.warn( 2025-12-04T12:12:58.0136077Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.) 2025-12-04T12:12:58.0136183Z return x.grad, w.grad 2025-12-04T12:12:58.0136993Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml - 2025-12-04T12:12:58.0137164Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0138233Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0138239Z 2025-12-04T12:12:58.0138454Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0139384Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0139404Z 2025-12-04T12:12:58.0139663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0139837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0140046Z ================== 1 failed, 174 deselected, 2 rerun in 4.99s ================== 2025-12-04T12:12:58.0140144Z Got exit code 1 2025-12-04T12:12:58.0140988Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True 2025-12-04T12:12:58.0141401Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:58.0142060Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml 2025-12-04T12:12:58.0142261Z ============================= test session starts ============================== 2025-12-04T12:12:58.0142606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0142714Z cachedir: .pytest_cache 2025-12-04T12:12:58.0143235Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0143356Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0143461Z configfile: pytest.ini 2025-12-04T12:12:58.0144050Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0144301Z collecting ... collected 380 items / 161 deselected / 219 selected 2025-12-04T12:12:58.0144463Z stepcurrent: skipping 161 already run items. 2025-12-04T12:12:58.0144576Z Running 14 items in this shard 2025-12-04T12:12:58.0144583Z 2025-12-04T12:12:58.0145584Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 7%] 2025-12-04T12:12:58.0146485Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5574s] [ 14%] 2025-12-04T12:12:58.0147416Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [ 14%] 2025-12-04T12:12:58.0148243Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1595s] [ 14%] 2025-12-04T12:12:58.0148251Z 2025-12-04T12:12:58.0148387Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0148951Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0149070Z Traceback (most recent call last): 2025-12-04T12:12:58.0149536Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0149748Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0149956Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0149961Z 2025-12-04T12:12:58.0150182Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0151117Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0151124Z 2025-12-04T12:12:58.0151384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0151613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0151722Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0151847Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0152178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0152392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0152502Z graph_break [] 2025-12-04T12:12:58.0152711Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0153498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0153612Z warnings.warn( 2025-12-04T12:12:58.0154211Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0154343Z Traceback (most recent call last): 2025-12-04T12:12:58.0154799Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0154993Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0155212Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0155218Z 2025-12-04T12:12:58.0155426Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0156410Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0156418Z 2025-12-04T12:12:58.0156679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0156895Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0157015Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0157129Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0157457Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0157716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0157812Z graph_break [] 2025-12-04T12:12:58.0158039Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0158754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0158854Z warnings.warn( 2025-12-04T12:12:58.0159080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0159187Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0159302Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0159527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0159857Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0159966Z graph_break [] 2025-12-04T12:12:58.0160182Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0160890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0161001Z warnings.warn( 2025-12-04T12:12:58.0161141Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0161697Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0161829Z Traceback (most recent call last): 2025-12-04T12:12:58.0162355Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0162570Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0162777Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0162785Z 2025-12-04T12:12:58.0162992Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0163935Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0163942Z 2025-12-04T12:12:58.0164200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0164478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0164590Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0164735Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0165084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0165300Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0165394Z graph_break [] 2025-12-04T12:12:58.0165621Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0166341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0166452Z warnings.warn( 2025-12-04T12:12:58.0166663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0166772Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0166935Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0167150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0167482Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0167592Z graph_break [] 2025-12-04T12:12:58.0167801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0168524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0168660Z warnings.warn( 2025-12-04T12:12:58.0168868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0168991Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0169099Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0169313Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0169657Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0169753Z graph_break [] 2025-12-04T12:12:58.0169977Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0170686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0170782Z warnings.warn( 2025-12-04T12:12:58.0171597Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml - 2025-12-04T12:12:58.0171764Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0172839Z FAILED [0.1595s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0172847Z 2025-12-04T12:12:58.0173056Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0174000Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0174017Z 2025-12-04T12:12:58.0174282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0174460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0174690Z ============ 1 failed, 1 skipped, 161 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:58.0174789Z Got exit code 1 2025-12-04T12:12:58.0174894Z Retrying single test... 2025-12-04T12:12:58.0175534Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml 2025-12-04T12:12:58.0175735Z ============================= test session starts ============================== 2025-12-04T12:12:58.0176120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0176229Z cachedir: .pytest_cache 2025-12-04T12:12:58.0176738Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0176871Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0176979Z configfile: pytest.ini 2025-12-04T12:12:58.0177551Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0177791Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0178841Z stepcurrent: skipping 162 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0178973Z Running 1 items in this shard 2025-12-04T12:12:58.0178981Z 2025-12-04T12:12:58.0179879Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5526s] [100%] 2025-12-04T12:12:58.0180783Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%] 2025-12-04T12:12:58.0181630Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%] 2025-12-04T12:12:58.0181635Z 2025-12-04T12:12:58.0181775Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0182346Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0182471Z Traceback (most recent call last): 2025-12-04T12:12:58.0182944Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0183136Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0183344Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0183349Z 2025-12-04T12:12:58.0183573Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0184497Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0184502Z 2025-12-04T12:12:58.0184774Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0184987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0185099Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0185227Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0185559Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0185773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0185881Z graph_break [] 2025-12-04T12:12:58.0186092Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0186820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0186918Z warnings.warn( 2025-12-04T12:12:58.0187477Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0187652Z Traceback (most recent call last): 2025-12-04T12:12:58.0188112Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0188349Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0188555Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0188560Z 2025-12-04T12:12:58.0188767Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0189707Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0189713Z 2025-12-04T12:12:58.0189973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0190227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0190339Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0190455Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0190804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0191019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0191115Z graph_break [] 2025-12-04T12:12:58.0191344Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0192092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0192207Z warnings.warn( 2025-12-04T12:12:58.0192416Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0192524Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0192649Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0192870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0193200Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0193312Z graph_break [] 2025-12-04T12:12:58.0193519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0194239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0194339Z warnings.warn( 2025-12-04T12:12:58.0194480Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0195046Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0195164Z Traceback (most recent call last): 2025-12-04T12:12:58.0195628Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0195832Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0196042Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0196047Z 2025-12-04T12:12:58.0196265Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0197192Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0197199Z 2025-12-04T12:12:58.0197459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0197682Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0197792Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0197916Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0198297Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0198512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0198655Z graph_break [] 2025-12-04T12:12:58.0198866Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0199583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0199697Z warnings.warn( 2025-12-04T12:12:58.0199907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0200033Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0200144Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0200360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0200731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0201045Z graph_break [] 2025-12-04T12:12:58.0201372Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0202101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0202252Z warnings.warn( 2025-12-04T12:12:58.0202474Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0202674Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0202785Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0203012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0203338Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0203432Z graph_break [] 2025-12-04T12:12:58.0203654Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0204370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0204487Z warnings.warn( 2025-12-04T12:12:58.0205285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml - 2025-12-04T12:12:58.0205452Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0206532Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0206538Z 2025-12-04T12:12:58.0206750Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0207701Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0207710Z 2025-12-04T12:12:58.0207972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0208148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0208358Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:58.0208459Z Got exit code 1 2025-12-04T12:12:58.0208577Z Retrying single test... 2025-12-04T12:12:58.0209202Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml 2025-12-04T12:12:58.0209362Z ============================= test session starts ============================== 2025-12-04T12:12:58.0209721Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0209877Z cachedir: .pytest_cache 2025-12-04T12:12:58.0210391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0210569Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0210676Z configfile: pytest.ini 2025-12-04T12:12:58.0211261Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0211484Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0212499Z stepcurrent: skipping 162 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0212621Z Running 1 items in this shard 2025-12-04T12:12:58.0212626Z 2025-12-04T12:12:58.0213558Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5434s] [100%] 2025-12-04T12:12:58.0214462Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [100%] 2025-12-04T12:12:58.0215276Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1592s] [100%] 2025-12-04T12:12:58.0215322Z 2025-12-04T12:12:58.0215472Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0216028Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0216148Z Traceback (most recent call last): 2025-12-04T12:12:58.0216627Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0216827Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0217047Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0217052Z 2025-12-04T12:12:58.0217259Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0218193Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0218200Z 2025-12-04T12:12:58.0218472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0218686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0218812Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0218929Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0219263Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0219494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0219591Z graph_break [] 2025-12-04T12:12:58.0219802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0220534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0220634Z warnings.warn( 2025-12-04T12:12:58.0221200Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0221318Z Traceback (most recent call last): 2025-12-04T12:12:58.0221777Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0222021Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0222227Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0222262Z 2025-12-04T12:12:58.0222471Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0223411Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0223419Z 2025-12-04T12:12:58.0223675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0223900Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0224005Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0224144Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0224506Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0224723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0224838Z graph_break [] 2025-12-04T12:12:58.0225051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0225761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0225874Z warnings.warn( 2025-12-04T12:12:58.0226116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0226243Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0226358Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0226573Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0226921Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0227019Z graph_break [] 2025-12-04T12:12:58.0227233Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0227961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0228064Z warnings.warn( 2025-12-04T12:12:58.0228221Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0228775Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0228899Z Traceback (most recent call last): 2025-12-04T12:12:58.0229374Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0229566Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0229773Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0229794Z 2025-12-04T12:12:58.0230005Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0230931Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0230939Z 2025-12-04T12:12:58.0231215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0231430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0231553Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0231666Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0231996Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0232221Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0232315Z graph_break [] 2025-12-04T12:12:58.0232561Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0233287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0233417Z warnings.warn( 2025-12-04T12:12:58.0233640Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0233748Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0233862Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0234092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0234418Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0234517Z graph_break [] 2025-12-04T12:12:58.0234743Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0235486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0235597Z warnings.warn( 2025-12-04T12:12:58.0235812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0235918Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0236042Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0236253Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0236577Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0236799Z graph_break [] 2025-12-04T12:12:58.0237007Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0237717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0237828Z warnings.warn( 2025-12-04T12:12:58.0238631Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml - 2025-12-04T12:12:58.0238813Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0239874Z FAILED [0.1592s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0239882Z 2025-12-04T12:12:58.0240106Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0241038Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0241043Z 2025-12-04T12:12:58.0241304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0241497Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0241693Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:58.0241806Z Got exit code 1 2025-12-04T12:12:58.0242722Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0243126Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:58.0243765Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml 2025-12-04T12:12:58.0243926Z ============================= test session starts ============================== 2025-12-04T12:12:58.0244322Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0244431Z cachedir: .pytest_cache 2025-12-04T12:12:58.0244940Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0245102Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0245208Z configfile: pytest.ini 2025-12-04T12:12:58.0245783Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0246025Z collecting ... collected 380 items / 163 deselected / 217 selected 2025-12-04T12:12:58.0246169Z stepcurrent: skipping 163 already run items. 2025-12-04T12:12:58.0246296Z Running 12 items in this shard 2025-12-04T12:12:58.0246301Z 2025-12-04T12:12:58.0247340Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 8%] 2025-12-04T12:12:58.0248344Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 16%] 2025-12-04T12:12:58.0249349Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [ 25%] 2025-12-04T12:12:58.0250274Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5536s] [ 33%] 2025-12-04T12:12:58.0251172Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1618s] [ 33%] 2025-12-04T12:12:58.0251978Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [ 33%] 2025-12-04T12:12:58.0251986Z 2025-12-04T12:12:58.0252135Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0252684Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0252806Z Traceback (most recent call last): 2025-12-04T12:12:58.0253282Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0253476Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0253694Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0253698Z 2025-12-04T12:12:58.0253910Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0254838Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0254859Z 2025-12-04T12:12:58.0255117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0255330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0255454Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0255566Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0255895Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0256124Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0256220Z graph_break [] 2025-12-04T12:12:58.0256432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0257197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0257324Z warnings.warn( 2025-12-04T12:12:58.0257890Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0258008Z Traceback (most recent call last): 2025-12-04T12:12:58.0258469Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0258672Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0258876Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0258881Z 2025-12-04T12:12:58.0259099Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0260060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0260068Z 2025-12-04T12:12:58.0260327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0260550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0260660Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0260818Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0261151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0261363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0261471Z graph_break [] 2025-12-04T12:12:58.0261681Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0262399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0262510Z warnings.warn( 2025-12-04T12:12:58.0262719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0262840Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0262950Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0263162Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0263506Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0263601Z graph_break [] 2025-12-04T12:12:58.0263809Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0264532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0264630Z warnings.warn( 2025-12-04T12:12:58.0264784Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0265335Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0265455Z Traceback (most recent call last): 2025-12-04T12:12:58.0265928Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0266121Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0266328Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0266345Z 2025-12-04T12:12:58.0266554Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0267483Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0267488Z 2025-12-04T12:12:58.0267796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0268036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0268142Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0268264Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0268595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0268820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0268914Z graph_break [] 2025-12-04T12:12:58.0269121Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0269846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0269943Z warnings.warn( 2025-12-04T12:12:58.0270182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0270301Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0270414Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0270639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0270969Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0271062Z graph_break [] 2025-12-04T12:12:58.0271284Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0272025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0272122Z warnings.warn( 2025-12-04T12:12:58.0272342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0272448Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0272574Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0272791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0273121Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0273227Z graph_break [] 2025-12-04T12:12:58.0273434Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0274142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0274254Z warnings.warn( 2025-12-04T12:12:58.0275054Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml - 2025-12-04T12:12:58.0275233Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0276300Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0276308Z 2025-12-04T12:12:58.0276533Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0277464Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0277471Z 2025-12-04T12:12:58.0277731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0277921Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0278134Z ============ 1 failed, 3 skipped, 163 deselected, 2 rerun in 4.94s ============= 2025-12-04T12:12:58.0278229Z Got exit code 1 2025-12-04T12:12:58.0278348Z Retrying single test... 2025-12-04T12:12:58.0279006Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml 2025-12-04T12:12:58.0279205Z ============================= test session starts ============================== 2025-12-04T12:12:58.0279544Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0279648Z cachedir: .pytest_cache 2025-12-04T12:12:58.0280166Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0280289Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0280405Z configfile: pytest.ini 2025-12-04T12:12:58.0280978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0281244Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0282343Z stepcurrent: skipping 166 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0282458Z Running 1 items in this shard 2025-12-04T12:12:58.0282463Z 2025-12-04T12:12:58.0283371Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5473s] [100%] 2025-12-04T12:12:58.0284298Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [100%] 2025-12-04T12:12:58.0285107Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1565s] [100%] 2025-12-04T12:12:58.0285129Z 2025-12-04T12:12:58.0285267Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0285823Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0285959Z Traceback (most recent call last): 2025-12-04T12:12:58.0286419Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0286618Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0286841Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0286846Z 2025-12-04T12:12:58.0287057Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0288006Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0288011Z 2025-12-04T12:12:58.0288273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0288488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0288614Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0288729Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0289075Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0289294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0289391Z graph_break [] 2025-12-04T12:12:58.0289619Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0290345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0290446Z warnings.warn( 2025-12-04T12:12:58.0291044Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0291194Z Traceback (most recent call last): 2025-12-04T12:12:58.0291674Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0291871Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0292079Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0292084Z 2025-12-04T12:12:58.0292309Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0293237Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0293241Z 2025-12-04T12:12:58.0293548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0293758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0293870Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0294000Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0294331Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0294546Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0294692Z graph_break [] 2025-12-04T12:12:58.0294900Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0295628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0295726Z warnings.warn( 2025-12-04T12:12:58.0295940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0296067Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0296179Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0296393Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0296730Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0296826Z graph_break [] 2025-12-04T12:12:58.0297045Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0297761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0297858Z warnings.warn( 2025-12-04T12:12:58.0298010Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0298563Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0298696Z Traceback (most recent call last): 2025-12-04T12:12:58.0299155Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0299347Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0299567Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0299572Z 2025-12-04T12:12:58.0299781Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0300710Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0300728Z 2025-12-04T12:12:58.0301280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0301490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0301617Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0301817Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0302148Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0302418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0302511Z graph_break [] 2025-12-04T12:12:58.0302733Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0303450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0303548Z warnings.warn( 2025-12-04T12:12:58.0303769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0303876Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0303987Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0304256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0304596Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0304706Z graph_break [] 2025-12-04T12:12:58.0304916Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0305624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0305778Z warnings.warn( 2025-12-04T12:12:58.0305986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0306096Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0306219Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0306434Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0306776Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0306871Z graph_break [] 2025-12-04T12:12:58.0307083Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0307800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0307895Z warnings.warn( 2025-12-04T12:12:58.0308695Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml - 2025-12-04T12:12:58.0308875Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0309931Z FAILED [0.1565s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0309937Z 2025-12-04T12:12:58.0310162Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0311092Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0311099Z 2025-12-04T12:12:58.0311373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0311546Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0311741Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:58.0311849Z Got exit code 1 2025-12-04T12:12:58.0311952Z Retrying single test... 2025-12-04T12:12:58.0312577Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml 2025-12-04T12:12:58.0312754Z ============================= test session starts ============================== 2025-12-04T12:12:58.0313126Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0313279Z cachedir: .pytest_cache 2025-12-04T12:12:58.0313789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0313910Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0314029Z configfile: pytest.ini 2025-12-04T12:12:58.0314606Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0314827Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0315891Z stepcurrent: skipping 166 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0316005Z Running 1 items in this shard 2025-12-04T12:12:58.0316010Z 2025-12-04T12:12:58.0316903Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5529s] [100%] 2025-12-04T12:12:58.0317797Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1621s] [100%] 2025-12-04T12:12:58.0318650Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1610s] [100%] 2025-12-04T12:12:58.0318656Z 2025-12-04T12:12:58.0318796Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0319347Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0319479Z Traceback (most recent call last): 2025-12-04T12:12:58.0319944Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0320151Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0320356Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0320361Z 2025-12-04T12:12:58.0320570Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0321515Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0321520Z 2025-12-04T12:12:58.0321779Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0322007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0322179Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0322299Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0322647Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0322862Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0322971Z graph_break [] 2025-12-04T12:12:58.0323182Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0323902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0324013Z warnings.warn( 2025-12-04T12:12:58.0324563Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0324683Z Traceback (most recent call last): 2025-12-04T12:12:58.0325200Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0325440Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0325660Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0325665Z 2025-12-04T12:12:58.0325873Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0326800Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0326807Z 2025-12-04T12:12:58.0327081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0327294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0327454Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0327567Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0327897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0328126Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0328221Z graph_break [] 2025-12-04T12:12:58.0328431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0329155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0329283Z warnings.warn( 2025-12-04T12:12:58.0329504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0329611Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0329721Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0329946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0330273Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0330367Z graph_break [] 2025-12-04T12:12:58.0330589Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0331296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0331408Z warnings.warn( 2025-12-04T12:12:58.0331548Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0332098Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _ 2025-12-04T12:12:58.0332228Z Traceback (most recent call last): 2025-12-04T12:12:58.0332689Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0332882Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0333099Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0333106Z 2025-12-04T12:12:58.0333313Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0334249Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0334257Z 2025-12-04T12:12:58.0334516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0334740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0334854Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0334966Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0335307Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0335580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0335678Z graph_break [] 2025-12-04T12:12:58.0335932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0336648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0336760Z warnings.warn( 2025-12-04T12:12:58.0336967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0337078Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0337206Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0337418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0337746Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0337851Z graph_break [] 2025-12-04T12:12:58.0338091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0338801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0338913Z warnings.warn( 2025-12-04T12:12:58.0339120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0339241Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0339353Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0339600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0339939Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0340034Z graph_break [] 2025-12-04T12:12:58.0340240Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0340967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0341063Z warnings.warn( 2025-12-04T12:12:58.0341889Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml - 2025-12-04T12:12:58.0342054Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0343113Z FAILED [0.1610s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0343135Z 2025-12-04T12:12:58.0343346Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0344279Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0344285Z 2025-12-04T12:12:58.0344557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0344737Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0344944Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ================== 2025-12-04T12:12:58.0345043Z Got exit code 1 2025-12-04T12:12:58.0345885Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False 2025-12-04T12:12:58.0346308Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:58.0346933Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml 2025-12-04T12:12:58.0347141Z ============================= test session starts ============================== 2025-12-04T12:12:58.0347485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0347622Z cachedir: .pytest_cache 2025-12-04T12:12:58.0348147Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0348269Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0348377Z configfile: pytest.ini 2025-12-04T12:12:58.0348969Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0349196Z collecting ... collected 380 items / 167 deselected / 213 selected 2025-12-04T12:12:58.0349356Z stepcurrent: skipping 167 already run items. 2025-12-04T12:12:58.0349471Z Running 8 items in this shard 2025-12-04T12:12:58.0349476Z 2025-12-04T12:12:58.0350520Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 12%] 2025-12-04T12:12:58.0351434Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5521s] [ 25%] 2025-12-04T12:12:58.0352333Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [ 25%] 2025-12-04T12:12:58.0353190Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [ 25%] 2025-12-04T12:12:58.0353196Z 2025-12-04T12:12:58.0353339Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0353904Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0354025Z Traceback (most recent call last): 2025-12-04T12:12:58.0354488Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0354697Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0354904Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0354909Z 2025-12-04T12:12:58.0355117Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0356063Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0356068Z 2025-12-04T12:12:58.0356330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0356557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0356673Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0356784Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0357132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0357345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0357458Z graph_break [] 2025-12-04T12:12:58.0357666Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0358379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0358489Z warnings.warn( 2025-12-04T12:12:58.0359076Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0359194Z Traceback (most recent call last): 2025-12-04T12:12:58.0359690Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0359881Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0360098Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0360103Z 2025-12-04T12:12:58.0360311Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0361239Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0361255Z 2025-12-04T12:12:58.0361512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0361759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0361880Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0361993Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0362396Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0362624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0362718Z graph_break [] 2025-12-04T12:12:58.0362929Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0363698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0363797Z warnings.warn( 2025-12-04T12:12:58.0364018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0364127Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0364239Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0364468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0364794Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0364890Z graph_break [] 2025-12-04T12:12:58.0365113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0365829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0365943Z warnings.warn( 2025-12-04T12:12:58.0366083Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0366635Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0366772Z Traceback (most recent call last): 2025-12-04T12:12:58.0367238Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0367445Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0367653Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0367658Z 2025-12-04T12:12:58.0367866Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0368807Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0368815Z 2025-12-04T12:12:58.0369078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0369302Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0369410Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0369522Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0369908Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0370123Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0370249Z graph_break [] 2025-12-04T12:12:58.0370470Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0371182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0371295Z warnings.warn( 2025-12-04T12:12:58.0371502Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0371609Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0371734Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0371951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0372310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0372425Z graph_break [] 2025-12-04T12:12:58.0372635Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0373358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0373457Z warnings.warn( 2025-12-04T12:12:58.0373664Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0373817Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0373927Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0374140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0374477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0374574Z graph_break [] 2025-12-04T12:12:58.0374783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0375503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0375602Z warnings.warn( 2025-12-04T12:12:58.0376410Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml - 2025-12-04T12:12:58.0376575Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0377633Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0377651Z 2025-12-04T12:12:58.0377860Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0378791Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0378802Z 2025-12-04T12:12:58.0379073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0379246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0379474Z ============ 1 failed, 1 skipped, 167 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:58.0379573Z Got exit code 1 2025-12-04T12:12:58.0379676Z Retrying single test... 2025-12-04T12:12:58.0380314Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml 2025-12-04T12:12:58.0380471Z ============================= test session starts ============================== 2025-12-04T12:12:58.0380816Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0380966Z cachedir: .pytest_cache 2025-12-04T12:12:58.0381474Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0381638Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0381741Z configfile: pytest.ini 2025-12-04T12:12:58.0382311Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0382547Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0383562Z stepcurrent: skipping 168 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0383687Z Running 1 items in this shard 2025-12-04T12:12:58.0383691Z 2025-12-04T12:12:58.0384612Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5510s] [100%] 2025-12-04T12:12:58.0385494Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%] 2025-12-04T12:12:58.0386317Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1577s] [100%] 2025-12-04T12:12:58.0386352Z 2025-12-04T12:12:58.0386491Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0387055Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0387177Z Traceback (most recent call last): 2025-12-04T12:12:58.0387640Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0387847Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0388051Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0388056Z 2025-12-04T12:12:58.0388279Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0389212Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0389220Z 2025-12-04T12:12:58.0389490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0389703Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0389815Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0389940Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0390268Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0390486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0390592Z graph_break [] 2025-12-04T12:12:58.0390801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0391529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0391630Z warnings.warn( 2025-12-04T12:12:58.0392181Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0392312Z Traceback (most recent call last): 2025-12-04T12:12:58.0392771Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0392995Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0393242Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0393247Z 2025-12-04T12:12:58.0393456Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0394395Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0394402Z 2025-12-04T12:12:58.0394661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0394874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0394998Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0395111Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0395490Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0395706Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0395804Z graph_break [] 2025-12-04T12:12:58.0396026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0396743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0396880Z warnings.warn( 2025-12-04T12:12:58.0397107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0397218Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0397344Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0397560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0397884Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0397994Z graph_break [] 2025-12-04T12:12:58.0398205Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0398920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0399032Z warnings.warn( 2025-12-04T12:12:58.0399173Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0399739Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0399859Z Traceback (most recent call last): 2025-12-04T12:12:58.0400321Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0400529Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0400737Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0400742Z 2025-12-04T12:12:58.0401237Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0402229Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0402236Z 2025-12-04T12:12:58.0402503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0402736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0402847Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0402962Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0403313Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0403527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0403640Z graph_break [] 2025-12-04T12:12:58.0403932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0404654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0404814Z warnings.warn( 2025-12-04T12:12:58.0405025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0405133Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0405263Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0405477Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0405818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0405914Z graph_break [] 2025-12-04T12:12:58.0406125Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0406896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0406997Z warnings.warn( 2025-12-04T12:12:58.0407204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0407329Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0407440Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0407669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0408054Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0408150Z graph_break [] 2025-12-04T12:12:58.0408374Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0409082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0409183Z warnings.warn( 2025-12-04T12:12:58.0410001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml - 2025-12-04T12:12:58.0410171Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0411252Z FAILED [0.1577s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0411260Z 2025-12-04T12:12:58.0411472Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0412414Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0412419Z 2025-12-04T12:12:58.0412685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0412863Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0413078Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ================== 2025-12-04T12:12:58.0413176Z Got exit code 1 2025-12-04T12:12:58.0413283Z Retrying single test... 2025-12-04T12:12:58.0413929Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml 2025-12-04T12:12:58.0414089Z ============================= test session starts ============================== 2025-12-04T12:12:58.0414442Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0414550Z cachedir: .pytest_cache 2025-12-04T12:12:58.0415055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0415228Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0415338Z configfile: pytest.ini 2025-12-04T12:12:58.0415952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0416171Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0417182Z stepcurrent: skipping 168 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0417307Z Running 1 items in this shard 2025-12-04T12:12:58.0417312Z 2025-12-04T12:12:58.0418211Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5367s] [100%] 2025-12-04T12:12:58.0419228Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%] 2025-12-04T12:12:58.0420044Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1561s] [100%] 2025-12-04T12:12:58.0420049Z 2025-12-04T12:12:58.0420201Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0420778Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0420895Z Traceback (most recent call last): 2025-12-04T12:12:58.0421369Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0421563Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0421771Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0421776Z 2025-12-04T12:12:58.0421998Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0422931Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0422936Z 2025-12-04T12:12:58.0423209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0423422Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0423534Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0423658Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0423989Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0424215Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0424313Z graph_break [] 2025-12-04T12:12:58.0424520Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0425253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0425353Z warnings.warn( 2025-12-04T12:12:58.0425903Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0426034Z Traceback (most recent call last): 2025-12-04T12:12:58.0426495Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0426700Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0426906Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0426913Z 2025-12-04T12:12:58.0427149Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0428094Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0428129Z 2025-12-04T12:12:58.0428390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0428615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0428725Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0428837Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0429179Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0429391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0429485Z graph_break [] 2025-12-04T12:12:58.0429736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0430452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0430565Z warnings.warn( 2025-12-04T12:12:58.0430772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0430880Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0431002Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0431247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0431576Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0431683Z graph_break [] 2025-12-04T12:12:58.0431892Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0432619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0432718Z warnings.warn( 2025-12-04T12:12:58.0432860Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0433427Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0433547Z Traceback (most recent call last): 2025-12-04T12:12:58.0434016Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0434208Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0434417Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0434422Z 2025-12-04T12:12:58.0434645Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0435582Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0435589Z 2025-12-04T12:12:58.0435861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0436070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0436180Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0436303Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0436633Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0436847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0436956Z graph_break [] 2025-12-04T12:12:58.0437163Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0437924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0438021Z warnings.warn( 2025-12-04T12:12:58.0438228Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0438378Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0438487Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0438701Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0439038Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0439136Z graph_break [] 2025-12-04T12:12:58.0439359Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0440069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0440165Z warnings.warn( 2025-12-04T12:12:58.0440415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0440524Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0440633Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0440863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0441191Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0441299Z graph_break [] 2025-12-04T12:12:58.0441506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0442320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0442434Z warnings.warn( 2025-12-04T12:12:58.0443238Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml - 2025-12-04T12:12:58.0443407Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0444478Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0444486Z 2025-12-04T12:12:58.0444697Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0445637Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0445644Z 2025-12-04T12:12:58.0445902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0446091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0446291Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ================== 2025-12-04T12:12:58.0446388Z Got exit code 1 2025-12-04T12:12:58.0447245Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0447647Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:58.0448285Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml 2025-12-04T12:12:58.0448447Z ============================= test session starts ============================== 2025-12-04T12:12:58.0448786Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0448905Z cachedir: .pytest_cache 2025-12-04T12:12:58.0449448Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0449570Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0449716Z configfile: pytest.ini 2025-12-04T12:12:58.0450290Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0450523Z collecting ... collected 380 items / 169 deselected / 211 selected 2025-12-04T12:12:58.0450666Z stepcurrent: skipping 169 already run items. 2025-12-04T12:12:58.0450780Z Running 6 items in this shard 2025-12-04T12:12:58.0450785Z 2025-12-04T12:12:58.0451808Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 16%] 2025-12-04T12:12:58.0452748Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5557s] [ 33%] 2025-12-04T12:12:58.0453648Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [ 33%] 2025-12-04T12:12:58.0454468Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1551s] [ 33%] 2025-12-04T12:12:58.0454503Z 2025-12-04T12:12:58.0454655Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0455204Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0455324Z Traceback (most recent call last): 2025-12-04T12:12:58.0455804Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0456000Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0456209Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0456227Z 2025-12-04T12:12:58.0456436Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0457367Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0457374Z 2025-12-04T12:12:58.0457647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0457861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0457985Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0458096Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0458431Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0458657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0458754Z graph_break [] 2025-12-04T12:12:58.0458964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0459692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0459792Z warnings.warn( 2025-12-04T12:12:58.0460356Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0460474Z Traceback (most recent call last): 2025-12-04T12:12:58.0460933Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0461138Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0461371Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0461377Z 2025-12-04T12:12:58.0461612Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0462554Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0462559Z 2025-12-04T12:12:58.0462819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0463042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0463160Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0463272Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0463616Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0463860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0463976Z graph_break [] 2025-12-04T12:12:58.0464186Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0464905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0465019Z warnings.warn( 2025-12-04T12:12:58.0465226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0465364Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0465490Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0465705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0466048Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0466146Z graph_break [] 2025-12-04T12:12:58.0466356Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0467084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0467186Z warnings.warn( 2025-12-04T12:12:58.0467327Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0467901Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0468022Z Traceback (most recent call last): 2025-12-04T12:12:58.0468498Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0468690Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0468895Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0468901Z 2025-12-04T12:12:58.0469128Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0470060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0470067Z 2025-12-04T12:12:58.0470345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0470555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0470668Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0470797Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0471128Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0471339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0471452Z graph_break [] 2025-12-04T12:12:58.0471660Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0472421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0472551Z warnings.warn( 2025-12-04T12:12:58.0472758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0472878Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0472987Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0473199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0473544Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0473639Z graph_break [] 2025-12-04T12:12:58.0473858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0474598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0474698Z warnings.warn( 2025-12-04T12:12:58.0474920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0475029Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0475138Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0475359Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0475683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0475843Z graph_break [] 2025-12-04T12:12:58.0476051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0476760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0476868Z warnings.warn( 2025-12-04T12:12:58.0477672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml - 2025-12-04T12:12:58.0477854Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0478911Z FAILED [0.1551s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0478918Z 2025-12-04T12:12:58.0479127Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0480070Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0480076Z 2025-12-04T12:12:58.0480332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0480522Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0480740Z ============ 1 failed, 1 skipped, 169 deselected, 2 rerun in 4.93s ============= 2025-12-04T12:12:58.0480837Z Got exit code 1 2025-12-04T12:12:58.0480955Z Retrying single test... 2025-12-04T12:12:58.0481582Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml 2025-12-04T12:12:58.0481754Z ============================= test session starts ============================== 2025-12-04T12:12:58.0482095Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0482267Z cachedir: .pytest_cache 2025-12-04T12:12:58.0482791Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0482912Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0483021Z configfile: pytest.ini 2025-12-04T12:12:58.0483649Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0483903Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0484934Z stepcurrent: skipping 170 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0485049Z Running 1 items in this shard 2025-12-04T12:12:58.0485054Z 2025-12-04T12:12:58.0485947Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5407s] [100%] 2025-12-04T12:12:58.0486881Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1573s] [100%] 2025-12-04T12:12:58.0487693Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [100%] 2025-12-04T12:12:58.0487701Z 2025-12-04T12:12:58.0487852Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0488405Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0488568Z Traceback (most recent call last): 2025-12-04T12:12:58.0489030Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0489220Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0489442Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0489449Z 2025-12-04T12:12:58.0489661Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0490601Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0490609Z 2025-12-04T12:12:58.0490868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0491084Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0491205Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0491317Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0491649Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0491872Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0491968Z graph_break [] 2025-12-04T12:12:58.0492194Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0492911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0493012Z warnings.warn( 2025-12-04T12:12:58.0493576Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0493695Z Traceback (most recent call last): 2025-12-04T12:12:58.0494158Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0494365Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0494571Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0494576Z 2025-12-04T12:12:58.0494797Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0495756Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0495804Z 2025-12-04T12:12:58.0496076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0496288Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0496397Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0496525Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0496856Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0497066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0497176Z graph_break [] 2025-12-04T12:12:58.0497387Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0498147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0498250Z warnings.warn( 2025-12-04T12:12:58.0498462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0498583Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0498696Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0498908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0499277Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0499372Z graph_break [] 2025-12-04T12:12:58.0499585Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0500306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0500409Z warnings.warn( 2025-12-04T12:12:58.0500562Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0501379Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0501504Z Traceback (most recent call last): 2025-12-04T12:12:58.0501977Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0502176Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0502398Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0502403Z 2025-12-04T12:12:58.0502612Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0503545Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0503550Z 2025-12-04T12:12:58.0503824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0504038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0504162Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0504275Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0504604Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0504829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0504925Z graph_break [] 2025-12-04T12:12:58.0505133Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0505857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0505955Z warnings.warn( 2025-12-04T12:12:58.0506260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0506367Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0506522Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0506752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0507084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0507177Z graph_break [] 2025-12-04T12:12:58.0507419Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0508129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0508240Z warnings.warn( 2025-12-04T12:12:58.0508448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0508557Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0508725Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0508942Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0509270Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0509380Z graph_break [] 2025-12-04T12:12:58.0509588Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0510307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0510450Z warnings.warn( 2025-12-04T12:12:58.0511248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml - 2025-12-04T12:12:58.0511429Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0512493Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0512501Z 2025-12-04T12:12:58.0512731Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0513665Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0513672Z 2025-12-04T12:12:58.0513931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0514118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0514315Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:58.0514431Z Got exit code 1 2025-12-04T12:12:58.0514537Z Retrying single test... 2025-12-04T12:12:58.0515165Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml 2025-12-04T12:12:58.0515338Z ============================= test session starts ============================== 2025-12-04T12:12:58.0515676Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0515782Z cachedir: .pytest_cache 2025-12-04T12:12:58.0516302Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0516422Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0516540Z configfile: pytest.ini 2025-12-04T12:12:58.0517109Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0517334Z collecting ... collected 380 items / 174 deselected / 206 selected 2025-12-04T12:12:58.0518409Z stepcurrent: skipping 170 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0518549Z Running 1 items in this shard 2025-12-04T12:12:58.0518555Z 2025-12-04T12:12:58.0519459Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5398s] [100%] 2025-12-04T12:12:58.0520350Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%] 2025-12-04T12:12:58.0521204Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%] 2025-12-04T12:12:58.0521211Z 2025-12-04T12:12:58.0521350Z ==================================== RERUNS ==================================== 2025-12-04T12:12:58.0521898Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0522029Z Traceback (most recent call last): 2025-12-04T12:12:58.0522551Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0522797Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0523006Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0523011Z 2025-12-04T12:12:58.0523222Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0524179Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0524185Z 2025-12-04T12:12:58.0524447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0524677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0524789Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0524903Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0525251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0525468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0525565Z graph_break [] 2025-12-04T12:12:58.0525793Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0526513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0526629Z warnings.warn( 2025-12-04T12:12:58.0527182Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0527303Z Traceback (most recent call last): 2025-12-04T12:12:58.0527776Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0527970Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0528177Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0528182Z 2025-12-04T12:12:58.0528407Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0529332Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0529337Z 2025-12-04T12:12:58.0529646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0529861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0530000Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0530130Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0530459Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0530687Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0530786Z graph_break [] 2025-12-04T12:12:58.0530998Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0531727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0531828Z warnings.warn( 2025-12-04T12:12:58.0532069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0532196Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0532310Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0532543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0532871Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0532967Z graph_break [] 2025-12-04T12:12:58.0533189Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0533929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0534025Z warnings.warn( 2025-12-04T12:12:58.0534179Z =================================== FAILURES =================================== 2025-12-04T12:12:58.0534734Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _ 2025-12-04T12:12:58.0534869Z Traceback (most recent call last): 2025-12-04T12:12:58.0535326Z File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd 2025-12-04T12:12:58.0535521Z act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f) 2025-12-04T12:12:58.0535740Z ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0535745Z 2025-12-04T12:12:58.0535953Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0536894Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0536899Z 2025-12-04T12:12:58.0537156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0537366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0537488Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0537600Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0537929Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0538151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0538245Z graph_break [] 2025-12-04T12:12:58.0538466Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0539178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0539277Z warnings.warn( 2025-12-04T12:12:58.0539498Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0539604Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0539717Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0539988Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0540312Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0540451Z graph_break [] 2025-12-04T12:12:58.0540664Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0541373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0541489Z warnings.warn( 2025-12-04T12:12:58.0541697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T12:12:58.0541805Z frames [('total', 1), ('ok', 1)] 2025-12-04T12:12:58.0541929Z stats [('calls_captured', 10)] 2025-12-04T12:12:58.0542146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)] 2025-12-04T12:12:58.0542513Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)] 2025-12-04T12:12:58.0542611Z graph_break [] 2025-12-04T12:12:58.0542818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T12:12:58.0543543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping 2025-12-04T12:12:58.0543642Z warnings.warn( 2025-12-04T12:12:58.0544441Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml - 2025-12-04T12:12:58.0544650Z =========================== short test summary info ============================ 2025-12-04T12:12:58.0545706Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0) 2025-12-04T12:12:58.0545714Z 2025-12-04T12:12:58.0545942Z To execute this test, run the following from the base repo dir: 2025-12-04T12:12:58.0546868Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0546876Z 2025-12-04T12:12:58.0547146Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T12:12:58.0547318Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T12:12:58.0547514Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ================== 2025-12-04T12:12:58.0547624Z Got exit code 1 2025-12-04T12:12:58.0548468Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False 2025-12-04T12:12:58.0548884Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T12:12:58.0549507Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml 2025-12-04T12:12:58.0549670Z ============================= test session starts ============================== 2025-12-04T12:12:58.0550025Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T12:12:58.0550132Z cachedir: .pytest_cache 2025-12-04T12:12:58.0550650Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T12:12:58.0550772Z rootdir: /var/lib/jenkins/workspace 2025-12-04T12:12:58.0550879Z configfile: pytest.ini 2025-12-04T12:12:58.0551463Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0 2025-12-04T12:12:58.0551717Z collecting ... collected 380 items / 171 deselected / 209 selected 2025-12-04T12:12:58.0551864Z stepcurrent: skipping 171 already run items. 2025-12-04T12:12:58.0552018Z Running 4 items in this shard 2025-12-04T12:12:58.0552023Z 2025-12-04T12:12:58.0553034Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 25%] 2025-12-04T12:12:58.0554049Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 50%] 2025-12-04T12:12:58.0555071Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0035s] (Skip non-critical tests to save resources.) [ 75%] 2025-12-04T12:12:58.0555803Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims1 SKIPPED [0.0027s] (Mix order reduction not enabled) [100%] 2025-12-04T12:12:58.0555811Z 2025-12-04T12:12:58.0556614Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml - 2025-12-04T12:12:58.0556797Z ====================== 4 skipped, 171 deselected in 0.06s ====================== 2025-12-04T12:12:58.0591720Z The following tests failed consistently: ['test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False'] 2025-12-04T12:12:58.0591930Z 2025-12-04T12:12:58.0592535Z FINISHED PRINTING LOG FILE of inductor/test_mix_order_reduction 1/2 (test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log) 2025-12-04T12:12:58.0592542Z 2025-12-04T12:12:58.0592936Z Finished inductor/test_mix_order_reduction 1/2 ... [2025-12-04 12:12:57.443674][10735.053576991], took 40.64min 2025-12-04T12:12:58.0593788Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml 2025-12-04T12:12:58.0594749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml 2025-12-04T12:12:58.0595599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml 2025-12-04T12:12:58.0596443Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml 2025-12-04T12:12:58.0597301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml 2025-12-04T12:12:58.0598140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml 2025-12-04T12:12:58.0599009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml 2025-12-04T12:12:58.0599865Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml 2025-12-04T12:12:58.0600721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml 2025-12-04T12:12:58.0602255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml 2025-12-04T12:12:58.0603163Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml 2025-12-04T12:12:58.0604029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml 2025-12-04T12:12:58.0604958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml 2025-12-04T12:12:58.0605818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml 2025-12-04T12:12:58.0606712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml 2025-12-04T12:12:58.0949972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml 2025-12-04T12:12:58.1298787Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml 2025-12-04T12:12:58.1696064Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml 2025-12-04T12:12:58.2076438Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml 2025-12-04T12:12:58.2402341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml 2025-12-04T12:12:58.2868060Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml 2025-12-04T12:12:58.3243787Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml 2025-12-04T12:12:58.3688093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml 2025-12-04T12:12:58.4148322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml 2025-12-04T12:12:58.4482346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml 2025-12-04T12:12:58.4916007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml 2025-12-04T12:12:58.5272134Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml 2025-12-04T12:12:58.5667915Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml 2025-12-04T12:12:58.6005719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml 2025-12-04T12:12:58.6329597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml 2025-12-04T12:12:58.6779253Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml 2025-12-04T12:12:58.7055320Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml 2025-12-04T12:12:58.7367394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml 2025-12-04T12:12:58.7681055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml 2025-12-04T12:12:58.8228022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml 2025-12-04T12:12:58.8551913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml 2025-12-04T12:12:58.9241224Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml 2025-12-04T12:12:58.9595042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml 2025-12-04T12:12:58.9896871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml 2025-12-04T12:12:59.0242557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml 2025-12-04T12:12:59.0530097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml 2025-12-04T12:12:59.0863314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml 2025-12-04T12:12:59.1147616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml 2025-12-04T12:12:59.1433957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml 2025-12-04T12:12:59.1763256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml 2025-12-04T12:12:59.2172212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml 2025-12-04T12:12:59.2493974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml 2025-12-04T12:12:59.2772228Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml 2025-12-04T12:12:59.3084960Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml 2025-12-04T12:12:59.3639783Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml 2025-12-04T12:12:59.4016780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml 2025-12-04T12:12:59.4410269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml 2025-12-04T12:12:59.4726115Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml 2025-12-04T12:12:59.5029829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml 2025-12-04T12:12:59.5531942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml 2025-12-04T12:12:59.5915467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml 2025-12-04T12:12:59.6262205Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml 2025-12-04T12:12:59.6777497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml 2025-12-04T12:12:59.7075322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml 2025-12-04T12:12:59.7357067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml 2025-12-04T12:12:59.7667300Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml 2025-12-04T12:12:59.7948826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml 2025-12-04T12:12:59.8302501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml 2025-12-04T12:12:59.8630311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml 2025-12-04T12:12:59.8950637Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml 2025-12-04T12:12:59.9266945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml 2025-12-04T12:12:59.9627672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml 2025-12-04T12:12:59.9996664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml 2025-12-04T12:13:00.0595111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml 2025-12-04T12:13:00.1102346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml 2025-12-04T12:13:00.1482169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml 2025-12-04T12:13:00.1860805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml 2025-12-04T12:13:00.2414334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml 2025-12-04T12:13:00.2843419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml 2025-12-04T12:13:00.3168467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml 2025-12-04T12:13:00.3485332Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml 2025-12-04T12:13:00.3820983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml 2025-12-04T12:13:00.4175809Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml 2025-12-04T12:13:00.4496378Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml 2025-12-04T12:13:00.4861464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml 2025-12-04T12:13:00.5275608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml 2025-12-04T12:13:00.5603822Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml 2025-12-04T12:13:00.5975109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml 2025-12-04T12:13:00.6380922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml 2025-12-04T12:13:00.6722004Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml 2025-12-04T12:13:00.7127716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml 2025-12-04T12:13:00.7454768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml 2025-12-04T12:13:00.7735799Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml 2025-12-04T12:13:00.8151761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml 2025-12-04T12:13:00.8461860Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml 2025-12-04T12:13:00.8739945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml 2025-12-04T12:13:00.9241707Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml 2025-12-04T12:13:00.9553714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml 2025-12-04T12:13:00.9867032Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml 2025-12-04T12:13:01.0211932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml 2025-12-04T12:13:01.0511922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml 2025-12-04T12:13:01.1011373Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml 2025-12-04T12:13:01.1332869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml 2025-12-04T12:13:01.1824730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml 2025-12-04T12:13:01.2231670Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml 2025-12-04T12:13:01.2762563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml 2025-12-04T12:13:01.3630045Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml 2025-12-04T12:13:01.3944517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml 2025-12-04T12:13:01.4395226Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml 2025-12-04T12:13:01.4713462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml 2025-12-04T12:13:01.4992285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml 2025-12-04T12:13:01.5296566Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml 2025-12-04T12:13:01.5621502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml 2025-12-04T12:13:01.5934944Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml 2025-12-04T12:13:01.6283758Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml 2025-12-04T12:13:01.6570511Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml 2025-12-04T12:13:01.6938916Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml 2025-12-04T12:13:01.7475582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml 2025-12-04T12:13:01.7938971Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml 2025-12-04T12:13:01.8431146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml 2025-12-04T12:13:01.8724552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml 2025-12-04T12:13:01.9046015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml 2025-12-04T12:13:01.9422556Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml 2025-12-04T12:13:01.9756303Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml 2025-12-04T12:13:02.0029015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml 2025-12-04T12:13:02.0769461Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml 2025-12-04T12:13:02.1297786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml 2025-12-04T12:13:02.1634029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml 2025-12-04T12:13:02.1918441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml 2025-12-04T12:13:02.2236278Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml 2025-12-04T12:13:02.2607697Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml 2025-12-04T12:13:02.3076002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml 2025-12-04T12:13:02.3646059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml 2025-12-04T12:13:02.3936256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml 2025-12-04T12:13:02.4339353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml 2025-12-04T12:13:02.9163818Z Uploading logs for 57119749427 to S3 2025-12-04T12:13:03.0374297Z Uploading artifacts took 0.57 seconds 2025-12-04T12:13:03.0374934Z inductor/test_mix_order_reduction 1/2 failed! 2025-12-04T12:13:03.0378699Z Running test_transformers 1/1 ... [2025-12-04 12:13:03.037692][10740.647598395] 2025-12-04T12:13:03.0379235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:13:03.0383715Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:03.038144] 2025-12-04T12:14:02.7418367Z 2025-12-04T12:14:02.7419598Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_cd619bbaee31992c_.log 2025-12-04T12:14:03.7592680Z Running 10091 items in this shard: test/test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda, test/test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda, test/test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda, test/test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_fp16_overflow_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_broken_166211_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_compiles_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_seqlen1_dropout_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda 2025-12-04T12:14:04.7686107Z 2025-12-04T12:14:04.7686457Z Finished test_transformers 1/1 ... [2025-12-04 12:14:02.768777][10800.378679093], took 1.00min 2025-12-04T12:14:04.7687630Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.xml 2025-12-04T12:14:04.7688704Z Running test_autograd 1/1 ... [2025-12-04 12:14:03.169345][10800.779250933] 2025-12-04T12:14:04.7689204Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:14:04.7690370Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:14:03.169809] 2025-12-04T12:15:29.4088548Z 2025-12-04T12:15:29.4089504Z test_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_1.1_343bbb8e8e4f4e62_.log 2025-12-04T12:15:29.4349401Z Running 659 items in this shard: test/test_autograd.py::TestAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/test_autograd.py::TestAutograd::test_accumulate_grad, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_should_not_execute, test/test_autograd.py::TestAutograd::test_accumulate_grad_tensor_reference, test/test_autograd.py::TestAutograd::test_accumulate_grad_with_zero_numel_grad, test/test_autograd.py::TestAutograd::test_anomaly_assign_parent_cleanup, test/test_autograd.py::TestAutograd::test_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_anomaly_grad_warnings, test/test_autograd.py::TestAutograd::test_anomaly_mode_no_check_nan, test/test_autograd.py::TestAutograd::test_attribute_deletion, test/test_autograd.py::TestAutograd::test_autograd_inplace_view_of_view, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_creation_meta, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_cross_dtype, test/test_autograd.py::TestAutograd::test_autograd_multiple_views_python, test/test_autograd.py::TestAutograd::test_autograd_node_isinstance, test/test_autograd.py::TestAutograd::test_autograd_print_tensor, test/test_autograd.py::TestAutograd::test_autograd_python_custom_function_inplace, test/test_autograd.py::TestAutograd::test_autograd_simple_views_python, test/test_autograd.py::TestAutograd::test_autograd_views_codegen, test/test_autograd.py::TestAutograd::test_backward, test/test_autograd.py::TestAutograd::test_backward_badcalls, test/test_autograd.py::TestAutograd::test_backward_copy, test/test_autograd.py::TestAutograd::test_backward_create_graph_warns, test/test_autograd.py::TestAutograd::test_backward_hook_relative_ordering, test/test_autograd.py::TestAutograd::test_backward_no_grad, test/test_autograd.py::TestAutograd::test_backward_to_node, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_with_inputs, test/test_autograd.py::TestAutograd::test_backward_with_nonleaf_inputs, test/test_autograd.py::TestAutograd::test_backward_with_scalar_input, test/test_autograd.py::TestAutograd::test_calculate_shape_util, test/test_autograd.py::TestAutograd::test_callback_adds_callback, test/test_autograd.py::TestAutograd::test_callback_propagates_errors_from_device_thread, test/test_autograd.py::TestAutograd::test_cant_create_saved_tensors, test/test_autograd.py::TestAutograd::test_checkpoint_detects_non_determinism, test/test_autograd.py::TestAutograd::test_checkpoint_graph_execution_group, test/test_autograd.py::TestAutograd::test_checkpoint_sequential_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpoint_valid_reset_on_error, test/test_autograd.py::TestAutograd::test_checkpoint_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpointing, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_cpu, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_gpu, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_arbitrary_input_output, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_correct_grad, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_custom_function_works, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_dataparallel, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_memory_savings, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_parameter_used_in_an_out, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_saved_object_identity, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_with_context_fn, test/test_autograd.py::TestAutograd::test_copy_slices_graph_task_updates, test/test_autograd.py::TestAutograd::test_create_graph_and_full_backward_hook_cycle, test/test_autograd.py::TestAutograd::test_current_graph_task_execution_order, test/test_autograd.py::TestAutograd::test_current_graph_task_id, test/test_autograd.py::TestAutograd::test_current_node, test/test_autograd.py::TestAutograd::test_custom_autograd_ac_early_stop, test/test_autograd.py::TestAutograd::test_custom_autograd_no_early_free, test/test_autograd.py::TestAutograd::test_custom_autograd_repeated_grad_grad, test/test_autograd.py::TestAutograd::test_custom_function_cycle, test/test_autograd.py::TestAutograd::test_custom_function_error, test/test_autograd.py::TestAutograd::test_custom_function_exception, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_forward_is_no_op, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_inplace_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_view_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_wrong_formula, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_non_default_view, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_view_of_leaf, test/test_autograd.py::TestAutograd::test_custom_function_local_inplace, test/test_autograd.py::TestAutograd::test_custom_function_mark_dirty_not_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_mark_output_view_of_intermediate, test/test_autograd.py::TestAutograd::test_custom_function_no_tensors, test/test_autograd.py::TestAutograd::test_custom_function_non_tensor_inputs_outputs, test/test_autograd.py::TestAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/test_autograd.py::TestAutograd::test_custom_function_return_view_in_nograd, test/test_autograd.py::TestAutograd::test_custom_function_save_for_forward, test/test_autograd.py::TestAutograd::test_custom_function_saved_tensors, test/test_autograd.py::TestAutograd::test_custom_function_saving_mutated_view_no_leak, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_input, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_output, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_simple, test/test_autograd.py::TestAutograd::test_custom_function_vmap_defaults, test/test_autograd.py::TestAutograd::test_deep_reentrant, test/test_autograd.py::TestAutograd::test_default_saved_tensors_hooks_double_backward, test/test_autograd.py::TestAutograd::test_dep_nograd, test/test_autograd.py::TestAutograd::test_dependent_backward, test/test_autograd.py::TestAutograd::test_detach, test/test_autograd.py::TestAutograd::test_detach_base, test/test_autograd.py::TestAutograd::test_detach_then_inplace_raises_in_autograd, test/test_autograd.py::TestAutograd::test_diagonal_expanded_v, test/test_autograd.py::TestAutograd::test_dir, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks_nested, test/test_autograd.py::TestAutograd::test_dont_materialize_grads, test/test_autograd.py::TestAutograd::test_duplicate_backward_root, test/test_autograd.py::TestAutograd::test_enable_grad_decorator_no_paren, test/test_autograd.py::TestAutograd::test_first_grad_fn_access_in_no_grad_mode, test/test_autograd.py::TestAutograd::test_free_deep_graph, test/test_autograd.py::TestAutograd::test_free_deep_graph_complicated, test/test_autograd.py::TestAutograd::test_free_deep_graph_pyfunction, test/test_autograd.py::TestAutograd::test_full_backward_hook_double_backward, test/test_autograd.py::TestAutograd::test_function, test/test_autograd.py::TestAutograd::test_function_returns_input, test/test_autograd.py::TestAutograd::test_function_returns_undefined_tensor, test/test_autograd.py::TestAutograd::test_gc_in_destructor, test/test_autograd.py::TestAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/test_autograd.py::TestAutograd::test_grad, test/test_autograd.py::TestAutograd::test_grad_badcalls, test/test_autograd.py::TestAutograd::test_grad_batched_grad, test/test_autograd.py::TestAutograd::test_grad_dtype, test/test_autograd.py::TestAutograd::test_grad_empty_inputs, test/test_autograd.py::TestAutograd::test_grad_fn_attr_bindings, test/test_autograd.py::TestAutograd::test_grad_fn_badcalls, test/test_autograd.py::TestAutograd::test_grad_fn_input_metadata, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_multiple_outputs, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_remove_hooks, test/test_autograd.py::TestAutograd::test_grad_materialize_grads, test/test_autograd.py::TestAutograd::test_grad_mode_class_decoration, test/test_autograd.py::TestAutograd::test_grad_mode_restored_reentrant, test/test_autograd.py::TestAutograd::test_grad_nonleaf, test/test_autograd.py::TestAutograd::test_grad_nonleaf_many_outputs, test/test_autograd.py::TestAutograd::test_grad_nonleaf_register_hook, test/test_autograd.py::TestAutograd::test_grad_thread_safety, test/test_autograd.py::TestAutograd::test_grad_to_node, test/test_autograd.py::TestAutograd::test_grad_to_node_inplace, test/test_autograd.py::TestAutograd::test_grad_to_node_materialize, test/test_autograd.py::TestAutograd::test_grad_to_node_multi, test/test_autograd.py::TestAutograd::test_grad_to_node_set, test/test_autograd.py::TestAutograd::test_grad_unreachable, test/test_autograd.py::TestAutograd::test_grad_unreachable_discovery, test/test_autograd.py::TestAutograd::test_gradcheck_backward_mul_by_grad_output, test/test_autograd.py::TestAutograd::test_gradcheck_check_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_check_forward_or_backward_only, test/test_autograd.py::TestAutograd::test_gradcheck_check_no_differentiable_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_complex_non_complex_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_custom_error, test/test_autograd.py::TestAutograd::test_gradcheck_default_device_placement_context, test/test_autograd.py::TestAutograd::test_gradcheck_dense_and_sparse_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_get_analytical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_get_numerical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout0, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout1, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout2, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout3, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout4, test/test_autograd.py::TestAutograd::test_gradcheck_jacobian_mismatch, test/test_autograd.py::TestAutograd::test_gradcheck_multiple_mkldnn_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_nondeterministic, test/test_autograd.py::TestAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/test_autograd.py::TestAutograd::test_gradcheck_single_input, test/test_autograd.py::TestAutograd::test_gradcheck_test_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_undefined_grad, test/test_autograd.py::TestAutograd::test_gradcheck_validates_input_mkldnn, test/test_autograd.py::TestAutograd::test_gradcheck_validates_inputs, test/test_autograd.py::TestAutograd::test_gradient_edge_graph_ownership, test/test_autograd.py::TestAutograd::test_gradient_edge_output, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu_cuda, test/test_autograd.py::TestAutograd::test_hessian_vector, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_edge_case_when_called_with_grad, test/test_autograd.py::TestAutograd::test_hook_none, test/test_autograd.py::TestAutograd::test_hook_with_no_name, test/test_autograd.py::TestAutograd::test_hooks, test/test_autograd.py::TestAutograd::test_hooks_cpp, test/test_autograd.py::TestAutograd::test_increment_version, test/test_autograd.py::TestAutograd::test_index_backward_does_not_save_tensor, test/test_autograd.py::TestAutograd::test_indexing, test/test_autograd.py::TestAutograd::test_indexing_duplicates, test/test_autograd.py::TestAutograd::test_inplace, test/test_autograd.py::TestAutograd::test_inplace_not_requires_grad, test/test_autograd.py::TestAutograd::test_inplace_on_view_backward, test/test_autograd.py::TestAutograd::test_inplace_on_view_leaf_errors, test/test_autograd.py::TestAutograd::test_inplace_on_view_saved_output, test/test_autograd.py::TestAutograd::test_inplace_on_view_weak_grad_fn, test/test_autograd.py::TestAutograd::test_input_buffer_accum, test/test_autograd.py::TestAutograd::test_integer_outputs, test/test_autograd.py::TestAutograd::test_invalid_gradients, test/test_autograd.py::TestAutograd::test_isolated_node, test/test_autograd.py::TestAutograd::test_leaf_assignment, test/test_autograd.py::TestAutograd::test_legacy_function_deprecation_exception, test/test_autograd.py::TestAutograd::test_lobpcg, test/test_autograd.py::TestAutograd::test_mark_non_differentiable, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_mixed, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_none, test/test_autograd.py::TestAutograd::test_materialize_grads, test/test_autograd.py::TestAutograd::test_multi_backward, test/test_autograd.py::TestAutograd::test_multi_backward_no_grad, test/test_autograd.py::TestAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_hooks_invalid_mode, test/test_autograd.py::TestAutograd::test_multiple_insert_removal_caching, test/test_autograd.py::TestAutograd::test_named_tensor_for_complex_views, test/test_autograd.py::TestAutograd::test_naughty_anomaly_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_attribute_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_stashing_ctx, test/test_autograd.py::TestAutograd::test_nested_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_nested_anomaly_printstack_cleanup, test/test_autograd.py::TestAutograd::test_next_functions, test/test_autograd.py::TestAutograd::test_no_grad, test/test_autograd.py::TestAutograd::test_no_grad_assignment, test/test_autograd.py::TestAutograd::test_no_grad_copy, test/test_autograd.py::TestAutograd::test_no_grad_copy_sparse, test/test_autograd.py::TestAutograd::test_no_grad_input, test/test_autograd.py::TestAutograd::test_no_grad_modifies_version, test/test_autograd.py::TestAutograd::test_no_grad_python_function, test/test_autograd.py::TestAutograd::test_no_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_no_unnecessary_save, test/test_autograd.py::TestAutograd::test_no_unnecessary_unwrapping, test/test_autograd.py::TestAutograd::test_node_ordering_when_none_returned, test/test_autograd.py::TestAutograd::test_node_post_hook_registered_during_unpack_hook, test/test_autograd.py::TestAutograd::test_not_implemented_fwad, test/test_autograd.py::TestAutograd::test_not_implemented_grad, test/test_autograd.py::TestAutograd::test_numpy_requires_grad, test/test_autograd.py::TestAutograd::test_once_differentiable, test/test_autograd.py::TestAutograd::test_out_variant_raises_when_inputs_require_grad, test/test_autograd.py::TestAutograd::test_pack_hook_with_inplace_modification_should_fail, test/test_autograd.py::TestAutograd::test_pickle, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_e2e, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_hooks, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_tensors, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_on_non_leaf, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_ordering, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_returns_not_None, test/test_autograd.py::TestAutograd::test_pow_zero_tensor_gradient, test/test_autograd.py::TestAutograd::test_power_function, test/test_autograd.py::TestAutograd::test_prehook_ordering, test/test_autograd.py::TestAutograd::test_profiler, test/test_autograd.py::TestAutograd::test_profiler_aggregation_fake, test/test_autograd.py::TestAutograd::test_profiler_aggregation_lstm, test/test_autograd.py::TestAutograd::test_profiler_aggregation_table, test/test_autograd.py::TestAutograd::test_profiler_function_event_avg, test/test_autograd.py::TestAutograd::test_profiler_propagation, test/test_autograd.py::TestAutograd::test_profiler_seq_nr, test/test_autograd.py::TestAutograd::test_profiler_shapes, test/test_autograd.py::TestAutograd::test_profiler_unboxed_only, test/test_autograd.py::TestAutograd::test_pynode_destruction_deadlock, test/test_autograd.py::TestAutograd::test_record_function, test/test_autograd.py::TestAutograd::test_record_function_callbacks, test/test_autograd.py::TestAutograd::test_record_function_legacy, test/test_autograd.py::TestAutograd::test_record_function_multithreaded, test/test_autograd.py::TestAutograd::test_reentrant_child_error, test/test_autograd.py::TestAutograd::test_reentrant_priority, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_both_depths, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_0, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_1, test/test_autograd.py::TestAutograd::test_reentrant_with_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_reentrant_with_non_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_requires_grad, test/test_autograd.py::TestAutograd::test_requires_grad_, test/test_autograd.py::TestAutograd::test_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad, test/test_autograd.py::TestAutograd::test_retain_grad_cycle, test/test_autograd.py::TestAutograd::test_retain_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad_inplace_over_view, test/test_autograd.py::TestAutograd::test_retains_grad_can_always_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_retains_grad_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_return_duplicate, test/test_autograd.py::TestAutograd::test_return_duplicate_inplace, test/test_autograd.py::TestAutograd::test_return_leaf, test/test_autograd.py::TestAutograd::test_return_leaf_inplace, test/test_autograd.py::TestAutograd::test_save_none_for_backward, test/test_autograd.py::TestAutograd::test_save_on_cpu_and_checkpoint, test/test_autograd.py::TestAutograd::test_save_output_nr, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_error_propagation, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_exit_during_bw_no_crash, test/test_autograd.py::TestAutograd::test_saved_tensors_hook_version_counter_not_shared, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_saved_original_inplace_detach, test/test_autograd.py::TestAutograd::test_saved_variable_version_counter, test/test_autograd.py::TestAutograd::test_saved_variables_deprecated, test/test_autograd.py::TestAutograd::test_saving_variable_to_disk, test/test_autograd.py::TestAutograd::test_scalar_grad_mixed_device, test/test_autograd.py::TestAutograd::test_select_expanded_v, test/test_autograd.py::TestAutograd::test_select_sum, test/test_autograd.py::TestAutograd::test_set_data_preserve_pyobj, test/test_autograd.py::TestAutograd::test_set_data_self_requires_grad, test/test_autograd.py::TestAutograd::test_set_data_tensorimpl_type, test/test_autograd.py::TestAutograd::test_set_grad_coroutines, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_benign_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_critical_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_exit, test/test_autograd.py::TestAutograd::test_set_grad_enabled, test/test_autograd.py::TestAutograd::test_set_grad_enabled_wraps, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions_recursive, test/test_autograd.py::TestAutograd::test_set_materialize_non_diff_grads, test/test_autograd.py::TestAutograd::test_setitem, test/test_autograd.py::TestAutograd::test_setitem_mask, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_not_fail, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_use_inner, test/test_autograd.py::TestAutograd::test_setup_context_when_forward_has_default_args, test/test_autograd.py::TestAutograd::test_shape, test/test_autograd.py::TestAutograd::test_sharded_grad, test/test_autograd.py::TestAutograd::test_simple_reentrant, test/test_autograd.py::TestAutograd::test_slice_expanded_v, test/test_autograd.py::TestAutograd::test_sparse_gather_both_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_dim0, test/test_autograd.py::TestAutograd::test_sparse_gather_dim1, test/test_autograd.py::TestAutograd::test_sparse_gather_dim_neg, test/test_autograd.py::TestAutograd::test_sparse_gather_ind_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_x_scalar, test/test_autograd.py::TestAutograd::test_sparse_mm_backward, test/test_autograd.py::TestAutograd::test_tensor_grad_warnings, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_over_view, test/test_autograd.py::TestAutograd::test_thread_shutdown, test/test_autograd.py::TestAutograd::test_to_sparse_backward, test/test_autograd.py::TestAutograd::test_too_many_grads, test/test_autograd.py::TestAutograd::test_type_conversions, test/test_autograd.py::TestAutograd::test_unpack_hooks_exec_count, test/test_autograd.py::TestAutograd::test_unrelated_inputs, test/test_autograd.py::TestAutograd::test_unsafe_set_version_counter, test/test_autograd.py::TestAutograd::test_unused_grad_requires_grad_with_materialize, test/test_autograd.py::TestAutograd::test_unused_output, test/test_autograd.py::TestAutograd::test_var_mean_differentiable, test/test_autograd.py::TestAutograd::test_variable_traverse, test/test_autograd.py::TestAutograd::test_version_counter, test/test_autograd.py::TestAutograd::test_view_func_replay, test/test_autograd.py::TestAutograd::test_view_func_replay_with_modified_state, test/test_autograd.py::TestAutograd::test_view_replay_enabled, test/test_autograd.py::TestAutograd::test_volatile_deprecated, test/test_autograd.py::TestAutograd::test_will_engine_execute_node, test/test_autograd.py::TestAutograd::test_wrapped_number_saved_tensors_hooks, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_not_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_metadata_check_for_storage_numel_skipped, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_basic, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_not_same_layout, test/test_autograd.py::TestAutogradForwardMode::test_advanced_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_backward_graph_destruction, test/test_autograd.py::TestAutogradForwardMode::test_basic_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_codegen_ignores_undefined_outputs, test/test_autograd.py::TestAutogradForwardMode::test_create_new_zeros_with_same_meta, test/test_autograd.py::TestAutogradForwardMode::test_default_level, test/test_autograd.py::TestAutogradForwardMode::test_detach_view_tracking, test/test_autograd.py::TestAutogradForwardMode::test_forward_level_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_grad_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_forbid_integral_dtype, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_inference_tensor_in_inference_mode, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_torch_dispatch, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_check_conj, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_ignores_size_zero, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_storage_numel, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_ignore_storage_offset_for_zero_numel_tensor, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_conj_bit, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_neg_bit, test/test_autograd.py::TestAutogradForwardMode::test_nested_level, test/test_autograd.py::TestAutogradForwardMode::test_non_differentiable, test/test_autograd.py::TestAutogradForwardMode::test_out_variant, test/test_autograd.py::TestAutogradForwardMode::test_print, test/test_autograd.py::TestAutogradForwardMode::test_set_fw_grad_having_own_fw_grad_at_same_level, test/test_autograd.py::TestAutogradForwardMode::test_set_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_size_check, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_always_creates_a_view, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_differentiable_views, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_non_differentiable_views, test/test_autograd.py::TestAllowMutationOnSaved::test_backward_out_of_context, test/test_autograd.py::TestAllowMutationOnSaved::test_basic, test/test_autograd.py::TestAllowMutationOnSaved::test_disallow_nesting, test/test_autograd.py::TestAllowMutationOnSaved::test_double_backward, test/test_autograd.py::TestAllowMutationOnSaved::test_inplace_foreach, test/test_autograd.py::TestAllowMutationOnSaved::test_save_base_and_modify_view, test/test_autograd.py::TestAllowMutationOnSaved::test_save_view_modify_base, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_but_not_anymore, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_different_versions, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_many_times, test/test_autograd.py::TestAllowMutationOnSaved::test_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_math_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_out_variant, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_context_manager, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_decorator, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_existing_autograd_session, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_direct_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_indirect_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_tensor_creation, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_normal_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_normal_mode, test/test_autograd.py::TestMultithreadAutograd::test_cat_stack_r_to_c, test/test_autograd.py::TestMultithreadAutograd::test_custom_function_propagates_errors_from_device_thread, test/test_autograd.py::TestMultithreadAutograd::test_dataparallel_saved_tensors_hooks, test/test_autograd.py::TestMultithreadAutograd::test_fork_join_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multithreaded_exception_propagation, test/test_autograd.py::TestMultithreadAutograd::test_preserve_backtrace, test/test_autograd.py::TestMultithreadAutograd::test_python_thread_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_set_multithreading_enabled_as_context_manager_and_function, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward_same_input, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop_no_recompution_needed, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_True, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_bad_inputs, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_can_only_trigger_recompute_once, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_flops_and_mem, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_more_than_one_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_non_tensor_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_output_already_has_autograd_meta, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_policy_with_state, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_storage_lifetime, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_subclass_dispatching_sizes, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_version_counter, test/test_autograd.py::TestAutogradComplex::test_view_func_for_complex_views, test/test_autograd.py::TestAutogradComplex::test_view_with_multi_output, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_logging_tensor, test/test_autograd.py::TestAutogradLogging::test_logging, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_large_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_memory_format_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_backward_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_complex_scalar_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy__cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_broadcasting_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_same_layout_copies_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_cross_device_reentrant_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_free_unneeded_tensor_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_grad_assignment_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_gradcheck_input_output_different_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_multiple_output_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_gradcheck_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_makes_base_require_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_modify_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_safe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_unsafe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multiple_outputs_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_non_contig_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_multiple_output_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_python_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_then_no_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_undefined_grad_output_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inputbuffer_add_multidevice_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_min_max_median_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_mv_grad_stride_0_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_non_differentiable_ops_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_parameter_resize_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pin_memory_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pow_real_negative_base_complex_exponent_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_itt_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_nvtx_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pyscalar_conversions_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_reentrant_parent_error_on_cpu_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_resize_version_bump_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_rnn_backward_to_input_but_not_parameters_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_amin_amax_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_prod_gradgrad_error_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int8, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_simple_reentrant_cross_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_mask_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_strided_leaf_grad_layout_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_to_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_unused_output_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_warning_in_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_functional_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_scalar_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_zero_dim_param_mixed_device_grad_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_atan2_zero_gradient_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_composite_implicit_and_dispatch_registration_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_multiple_dispatch_registrations_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_single_threaded_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_tls_stash_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_foward_mode_AD_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_is_retain_graph_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_per_dispatch_key_input_saving_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_set_sequence_nr_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_view_copy_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_multi_producer_case_4_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_2_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_3_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_3_correctness_non_default_ambient_stream_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_4_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_side_stream_backward_overlap_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_warn_on_accumulate_grad_stream_mismatch_flag_cuda 2025-12-04T12:15:29.4605023Z 2025-12-04T12:15:29.4605360Z Finished test_autograd 1/1 ... [2025-12-04 12:15:29.409626][10887.019532931], took 1.44min 2025-12-04T12:15:29.4606446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.xml 2025-12-04T12:15:29.5275811Z Running test_sparse 1/2 ... [2025-12-04 12:15:29.527270][10887.137177326] 2025-12-04T12:15:29.5276536Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:15:29.5279842Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:29.527688] 2025-12-04T12:20:11.2431182Z 2025-12-04T12:20:11.2432092Z test_sparse 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_1.2_170c4a4cb63931fe_.log 2025-12-04T12:20:11.3072714Z Running 1525 items in this shard: test/test_sparse.py::TestSparseOneOff::test_cuda_from_cpu, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_any_cuda, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_assign_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dtypes_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda, test/test_sparse.py::TestSparseCUDA::test_is_sparse_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mv_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_resize_as_cuda, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda 2025-12-04T12:20:11.3699945Z 2025-12-04T12:20:11.3700271Z Finished test_sparse 1/2 ... [2025-12-04 12:20:11.245129][11168.855034908], took 4.70min 2025-12-04T12:20:11.3701480Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.xml 2025-12-04T12:20:11.3878103Z Running test_decomp 2/17 ... [2025-12-04 12:20:11.387525][11168.997430279] 2025-12-04T12:20:11.3878614Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:20:11.3881961Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=2', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:11.387961] 2025-12-04T12:29:56.4209807Z 2025-12-04T12:29:56.4210689Z test_decomp 2/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_2.17_4858d88ccf44ed88_.log 2025-12-04T12:29:56.4414942Z Running 535 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_kaiser_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_real_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_eval_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_train_mode_cuda_float32, test/test_decomp.py::HasDecompTest::test_aten_core_operators 2025-12-04T12:29:56.4612784Z 2025-12-04T12:29:56.4613099Z Finished test_decomp 2/17 ... [2025-12-04 12:29:56.421665][11754.031569993], took 9.75min 2025-12-04T12:29:56.4614238Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.xml 2025-12-04T12:29:56.5374105Z Running test_decomp 7/17 ... [2025-12-04 12:29:56.537081][11754.146987533] 2025-12-04T12:29:56.5374648Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:29:56.5377444Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=7', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:29:56.537519] 2025-12-04T12:40:05.6661963Z 2025-12-04T12:40:05.6663035Z test_decomp 7/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_7.17_ecdc7da48044ddba_.log 2025-12-04T12:40:05.6880707Z Running 583 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_istft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_blackman_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int32, test/test_decomp.py::DecompOneOffTestsCUDA::test_exponential_non_inf_cuda 2025-12-04T12:40:05.7095774Z 2025-12-04T12:40:05.7096100Z Finished test_decomp 7/17 ... [2025-12-04 12:40:05.666866][12363.276773316], took 10.15min 2025-12-04T12:40:05.7097140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.xml 2025-12-04T12:40:07.0285465Z Uploading artifacts took 1.15 seconds 2025-12-04T12:40:07.0289201Z Running test_decomp 12/17 ... [2025-12-04 12:40:07.028730][12364.638637754] 2025-12-04T12:40:07.0289716Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:40:07.0294098Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=12', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:07.029164] 2025-12-04T12:50:28.0101285Z 2025-12-04T12:50:28.0102325Z test_decomp 12/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_12.17_884069b3bca145fc_.log 2025-12-04T12:50:28.0300990Z Running 526 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_nuc_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_quantile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_int32, test/test_decomp.py::DecompOneOffTestsCUDA::test_elu_backward_cuda, test/test_decomp.py::HasDecompTest::test_mm_decompose_mm_dde 2025-12-04T12:50:28.0505065Z 2025-12-04T12:50:28.0505404Z Finished test_decomp 12/17 ... [2025-12-04 12:50:28.010711][12985.620617205], took 10.35min 2025-12-04T12:50:28.0506454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.xml 2025-12-04T12:50:28.1219671Z Running test_decomp 17/17 ... [2025-12-04 12:50:28.121624][12985.731530802] 2025-12-04T12:50:28.1220377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:50:28.1223198Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=17', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:50:28.122058] 2025-12-04T12:59:27.6090513Z 2025-12-04T12:59:27.6091440Z test_decomp 17/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_17.17_4ba2ec57e0bb6714_.log 2025-12-04T12:59:27.6292837Z Running 535 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pdist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_roll_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_vdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e5m2, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_bool, test/test_decomp.py::DecompOneOffTestsCUDA::test_native_layer_norm_cpu_decomp_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float32 2025-12-04T12:59:27.6490448Z 2025-12-04T12:59:27.6490796Z Finished test_decomp 17/17 ... [2025-12-04 12:59:27.609603][13525.219510278], took 8.99min 2025-12-04T12:59:27.6491837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.xml 2025-12-04T12:59:27.7248121Z Running test_meta 5/5 ... [2025-12-04 12:59:27.724444][13525.334350814] 2025-12-04T12:59:27.7248716Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:59:27.7251829Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:27.724878] 2025-12-04T13:25:55.7017850Z 2025-12-04T13:25:55.7018822Z test_meta 5/5 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_5.5_1a0c05f4e7432569_.log 2025-12-04T13:25:56.0416318Z Running 8325 items in this shard: test/test_meta.py::TestMetaConverter::test_channels_last_non_leaf, test/test_meta.py::TestMetaConverter::test_tensor_outlives_converter, test/test_meta.py::TestMetaConverter::test_view_of_leaf, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_empty_quantized_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask4_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_meta__fused_moving_avg_obs_fq_helper_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float16_float16_cuda, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float32_float32_cuda, test/test_meta.py::TestMetaCUDA::test_nan_to_num_cuda, test/test_meta.py::TestMetaCUDA::test_nonzero_cuda 2025-12-04T13:25:56.3781506Z 2025-12-04T13:25:56.3781809Z Finished test_meta 5/5 ... [2025-12-04 13:25:55.712663][15113.322566255], took 26.47min 2025-12-04T13:25:56.3782840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.xml 2025-12-04T13:25:57.1767026Z Uploading artifacts took 1.17 seconds 2025-12-04T13:25:57.1771111Z Running test_nestedtensor 1/4 ... [2025-12-04 13:25:57.176936][15114.786843181] 2025-12-04T13:25:57.1771730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:25:57.1776382Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:25:57.177390] 2025-12-04T13:33:42.7729421Z 2025-12-04T13:33:42.7733145Z test_nestedtensor 1/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_1.4_6dff2e85dc80cacf_.log 2025-12-04T13:33:42.7968742Z Running 408 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_cat, test/test_nestedtensor.py::TestNestedTensor::test_copy_, test/test_nestedtensor.py::TestNestedTensor::test_jagged_with_dim_error, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_randn_like, test/test_nestedtensor.py::TestNestedTensor::test_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_size_dim, test/test_nestedtensor.py::TestNestedTensor::test_unbind_4, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_eq_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_ge_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_is_all_true_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_is_any_true_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_share_memory_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_logical_not_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_neg_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_add_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_sub_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1023_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_mask_and_to_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_to_padded_tensor_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_unbind_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_broadcasting_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_chunk_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_padded_dense_conversion_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flatten_decomp_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_index_put_error_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_contiguous_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_full_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_5_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_with_sizes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_last_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_or_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_zeta_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_spherical_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unflatten_cuda_float32 2025-12-04T13:33:42.8203627Z 2025-12-04T13:33:42.8203974Z Finished test_nestedtensor 1/4 ... [2025-12-04 13:33:42.773579][15580.383484573], took 7.76min 2025-12-04T13:33:42.8205185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.xml 2025-12-04T13:33:42.8949140Z Running test_nestedtensor 4/4 ... [2025-12-04 13:33:42.894273][15580.504178771] 2025-12-04T13:33:42.8949847Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:33:42.8951755Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:33:42.894712] 2025-12-04T13:44:53.6410368Z 2025-12-04T13:44:53.6411718Z test_nestedtensor 4/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_4.4_fadd9c2633e00561_.log 2025-12-04T13:44:53.6648278Z Running 415 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_default_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_nested_namespace, test/test_nestedtensor.py::TestNestedTensor::test_numel, test/test_nestedtensor.py::TestNestedTensor::test_size, test/test_nestedtensor.py::TestNestedTensor::test_stride, test/test_nestedtensor.py::TestNestedTensor::test_unbind_dim, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_device_checks_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isnan_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float64, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_masked_fill_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_split_with_sizes_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_apply__cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_same_size_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_ones_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randint_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_zeros_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_activation_checkpoint_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_permute_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_0_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unsafe_view_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_views_inherit_ragged_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32 2025-12-04T13:44:53.6881603Z 2025-12-04T13:44:53.6881999Z Finished test_nestedtensor 4/4 ... [2025-12-04 13:44:53.641576][16251.251482241], took 11.18min 2025-12-04T13:44:53.6883171Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.xml 2025-12-04T13:44:53.8142483Z Running test_ops 5/11 ... [2025-12-04 13:44:53.813970][16251.423877342] 2025-12-04T13:44:53.8142979Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T13:44:53.8146570Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=5', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:53.814438] 2025-12-04T14:05:21.9208174Z 2025-12-04T14:05:21.9209043Z test_ops 5/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_5.11_352ce2577683b96d_.log 2025-12-04T14:05:22.0437740Z Running 3037 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing__chunk_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_permuted_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_static_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_errors_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cov_cuda, test/test_ops.py::TestCommonCUDA::test_errors_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout0_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_permuted_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch__scaled_mm_v2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__flash_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_number_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__chunk_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vsplit_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_permuted_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expm1_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__chunk_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view__batch_norm_with_update_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_select_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_permuted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float64, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_polar_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_take_along_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_unsafe_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64 2025-12-04T14:05:22.1633188Z 2025-12-04T14:05:22.1633508Z Finished test_ops 5/11 ... [2025-12-04 14:05:21.924683][17479.53458861], took 20.47min 2025-12-04T14:05:22.1634534Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.xml 2025-12-04T14:05:23.5040039Z Uploading artifacts took 1.37 seconds 2025-12-04T14:05:23.5044416Z Running test_ops 10/11 ... [2025-12-04 14:05:23.504261][17481.114167405] 2025-12-04T14:05:23.5044895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:05:23.5049221Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=10', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:05:23.504717] 2025-12-04T14:26:50.4287821Z 2025-12-04T14:26:50.4288726Z test_ops 10/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_10.11_9feb13593ea58df6_.log 2025-12-04T14:26:50.5496830Z Running 2991 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nanmean_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diff_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mul_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ne_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_static_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_put_accumulate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__chunk_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nanmean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_item_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_lengths_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_in_place_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_mm_reduce_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cauchy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_item_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_renorm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_in_place_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_no_rounding_mode_cuda_float32 2025-12-04T14:26:50.6673467Z 2025-12-04T14:26:50.6673781Z Finished test_ops 10/11 ... [2025-12-04 14:26:50.432867][18768.04277075], took 21.45min 2025-12-04T14:26:50.6674795Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.xml 2025-12-04T14:26:51.8004059Z Uploading artifacts took 1.18 seconds 2025-12-04T14:26:51.8007955Z Running functorch/test_ops 2/7 ... [2025-12-04 14:26:51.800613][18769.410519202] 2025-12-04T14:26:51.8008488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:26:51.8013194Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=2', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:51.801061] 2025-12-04T14:38:56.6341957Z 2025-12-04T14:38:56.6342940Z functorch/test_ops 2/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_2.7_066e83f50e6dcbea_.log 2025-12-04T14:38:56.7026543Z Running 1440 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_cross_entropy_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_ceil_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_expand_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_expand_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_list_args_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_complex_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rpow___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_contiguous_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_no_rounding_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_histc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_unary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kthvalue_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cond_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_householder_product_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardswish_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_static_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rot90_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_true_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_xlogy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyTakeAutogradFunction_cuda_float32 2025-12-04T14:38:56.7693154Z 2025-12-04T14:38:56.7693522Z Finished functorch/test_ops 2/7 ... [2025-12-04 14:38:56.636311][19494.246216621], took 12.08min 2025-12-04T14:38:56.7694714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.xml 2025-12-04T14:38:56.7809992Z Running functorch/test_ops 7/7 ... [2025-12-04 14:38:56.780708][19494.390614885] 2025-12-04T14:38:56.7810543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:38:56.7813745Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=7', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:38:56.781152] 2025-12-04T14:50:34.1726930Z 2025-12-04T14:50:34.1727898Z functorch/test_ops 7/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_7.7_c87f7efa94ae13b4_.log 2025-12-04T14:50:34.2406387Z Running 1436 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_list_args_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unsqueeze_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyTakeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log1p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nansum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_unshuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtri_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32 2025-12-04T14:50:34.3067721Z 2025-12-04T14:50:34.3068084Z Finished functorch/test_ops 7/7 ... [2025-12-04 14:50:34.174764][20191.784668984], took 11.62min 2025-12-04T14:50:34.3069254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.xml 2025-12-04T14:50:35.5274476Z Uploading artifacts took 1.17 seconds 2025-12-04T14:50:35.5278397Z Running inductor/test_max_autotune 1/1 ... [2025-12-04 14:50:35.527663][20193.137569045] 2025-12-04T14:50:35.5279008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:50:35.5283365Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_max_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:50:35.528095] 2025-12-04T14:50:45.1431166Z 2025-12-04T14:50:45.1432197Z inductor/test_max_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_max_autotune_1.1_dc9c21bc2c4ad5fc_.log 2025-12-04T14:50:45.1433019Z 2025-12-04T14:50:45.1433372Z Finished inductor/test_max_autotune 1/1 ... [2025-12-04 14:50:45.142893][20202.75280313], took 0.16min 2025-12-04T14:50:45.1712884Z Running inductor/test_cpu_repro 3/3 ... [2025-12-04 14:50:45.171057][20202.7809665] 2025-12-04T14:50:45.1713440Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T14:50:45.1716878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:50:45.171434] 2025-12-04T15:03:50.0355939Z 2025-12-04T15:03:50.0357404Z inductor/test_cpu_repro 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_3.3_41613d465af9d6d5_.log 2025-12-04T15:03:50.0531902Z Running 230 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_acosh_with_negative_large_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_argmax_argmin_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_argmin, test/inductor/test_cpu_repro.py::CPUReproTests::test_atomic_add_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_right_shift, test/inductor/test_cpu_repro.py::CPUReproTests::test_bool_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_broadcast_mul_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_cat_mul, test/inductor/test_cpu_repro.py::CPUReproTests::test_channel_shuffle_cl_output, test/inductor/test_cpu_repro.py::CPUReproTests::test_complex_memory_overlap, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv2d_bn_mixed_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_transpose2d_has_output_size_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_transpose2d_packed_cpu, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_double_to_fp32_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_to_double_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int64_to_int32_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int8_to_half_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_cpu_vec_cosim, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_dequant_relu_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_maxpool2d_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_fp8_e4m3, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_fp8_e5m2, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_disabled_amp_is_inference_True, test/inductor/test_cpu_repro.py::CPUReproTests::test_dropout, test/inductor/test_cpu_repro.py::CPUReproTests::test_embedding_vec_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_expr_vec_non_contiguous, test/inductor/test_cpu_repro.py::CPUReproTests::test_float32_to_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp32_load_with_to_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_15,3,13, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_float32_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_fractional_max_pool2d_3d_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_full_boolean_dynamic_shape, test/inductor/test_cpu_repro.py::CPUReproTests::test_fused_attention_conv, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_large_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_large_size, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_highp_to_lowp_cse_var_cache_with_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_horizontal_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_put2, test/inductor/test_cpu_repro.py::CPUReproTests::test_int64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_int_div_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_invalid_dropout_args, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_buffer_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_float64, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_packed, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_with_no_default_contiguous_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_local_buffer_with_line_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_max_reduction_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_meta_device, test/inductor/test_cpu_repro.py::CPUReproTests::test_module_buffer_mutation, test/inductor/test_cpu_repro.py::CPUReproTests::test_new_vec_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_fold, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_reduction_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_looop_fusion_with_local_buf, test/inductor/test_cpu_repro.py::CPUReproTests::test_pow_cos, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_permute_reshape_reinterpret_view, test/inductor/test_cpu_repro.py::CPUReproTests::test_repeated_exp, test/inductor/test_cpu_repro.py::CPUReproTests::test_require_stride_order_non_owning, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_set_source_Tensor, test/inductor/test_cpu_repro.py::CPUReproTests::test_sigmoid_with_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_symbolic_shape_scalar_value_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_tanh_atan2, test/inductor/test_cpu_repro.py::CPUReproTests::test_tanh_atan2_use_decompose_tanh, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_load_decomposed_dequant_add_relu_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_store_channel_shuffle_cl_quant_output_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_bool_float, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_mxn_32_32_bf16_fp16, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_sum_outer, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_with_norm, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint32_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_sub, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_contiguous_ModularIndexing, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_kernel_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_remainder, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_transpose_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_vertical_sum_cpu_only 2025-12-04T15:03:50.0702239Z 2025-12-04T15:03:50.0702626Z Finished inductor/test_cpu_repro 3/3 ... [2025-12-04 15:03:50.036011][20987.64591466], took 13.08min 2025-12-04T15:03:50.0703863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.xml 2025-12-04T15:03:50.1745435Z Running inductor/test_mkldnn_pattern_matcher 2/3 ... [2025-12-04 15:03:50.174248][20987.784155877] 2025-12-04T15:03:50.1746059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:03:50.1749443Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:50.174672] 2025-12-04T15:10:35.2469615Z 2025-12-04T15:10:35.2471893Z inductor/test_mkldnn_pattern_matcher 2/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_2.3_52e8559de495a0be_.log 2025-12-04T15:10:35.2542357Z Running 99 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_pass_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_hardtanh_pattern_fallback, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_unary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_multi_linear_share_same_input, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardtanh, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_relu6, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_3, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_int8_mixed_bf16_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu6_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_fp8_inductor_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_sum_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qmaxpool2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_121253_issue_addmm_fusion_check, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int4_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int8, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_q_attention_block 2025-12-04T15:10:35.2610033Z 2025-12-04T15:10:35.2610453Z Finished inductor/test_mkldnn_pattern_matcher 2/3 ... [2025-12-04 15:10:35.246932][21392.856840636], took 6.75min 2025-12-04T15:10:35.2756013Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.xml 2025-12-04T15:10:35.3740342Z Running inductor/test_cpu_select_algorithm 1/1 ... [2025-12-04 15:10:35.373650][21392.983558377] 2025-12-04T15:10:35.3741013Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:10:35.3743713Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:35.374076] 2025-12-04T15:10:47.5657682Z 2025-12-04T15:10:47.5658762Z inductor/test_cpu_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_select_algorithm_1.1_2b85f4e0fd3f066c_.log 2025-12-04T15:10:47.5659818Z Running 0 items in this shard: 2025-12-04T15:10:47.5660029Z 2025-12-04T15:10:47.5660418Z Finished inductor/test_cpu_select_algorithm 1/1 ... [2025-12-04 15:10:47.565542][21405.17545116], took 0.20min 2025-12-04T15:10:47.5942417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.xml 2025-12-04T15:10:48.8886576Z Uploading artifacts took 1.22 seconds 2025-12-04T15:10:48.8890986Z Running test_custom_ops 1/1 ... [2025-12-04 15:10:48.888875][21406.498782603] 2025-12-04T15:10:48.8891594Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:10:48.8895503Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_custom_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:48.889311] 2025-12-04T15:11:31.7664530Z 2025-12-04T15:11:31.7667693Z test_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_custom_ops_1.1_37d60717605e8cfe_.log 2025-12-04T15:11:31.7770181Z Running 282 items in this shard: test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_meta, test/test_custom_ops.py::TestCustomOp::test_autogen_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_autograd_function_backed_op, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented_gradmode, test/test_custom_ops.py::TestCustomOp::test_backward_dict_grad_for_nontensor, test/test_custom_ops.py::TestCustomOp::test_backward_dict_invalid_keys, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_optional_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_grads_are_tensor_or_none, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_views, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_Autograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCPU, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_non_tensor, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_numel, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_tensorlist, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_type, test/test_custom_ops.py::TestCustomOp::test_backward_partially_registered, test/test_custom_ops.py::TestCustomOp::test_backward_returns_dict, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_none_or_Tensor, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/test_custom_ops.py::TestCustomOp::test_basic_make_fx, test/test_custom_ops.py::TestCustomOp::test_builtin_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_builtin_torchscript_ops, test/test_custom_ops.py::TestCustomOp::test_data_dependent_basic, test/test_custom_ops.py::TestCustomOp::test_data_dependent_compile, test/test_custom_ops.py::TestCustomOp::test_data_dependent_fake_tracing, test/test_custom_ops.py::TestCustomOp::test_data_dependent_nms_dynamic_compile, test/test_custom_ops.py::TestCustomOp::test_define_and_impl, test/test_custom_ops.py::TestCustomOp::test_define_bad_schema, test/test_custom_ops.py::TestCustomOp::test_define_validation, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_list, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_single, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_tuple, test/test_custom_ops.py::TestCustomOp::test_defined_in_python, test/test_custom_ops.py::TestCustomOp::test_duplicate_impl, test/test_custom_ops.py::TestCustomOp::test_functionalize_error, test/test_custom_ops.py::TestCustomOp::test_impl_abstract_overload, test/test_custom_ops.py::TestCustomOp::test_impl_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cuda, test/test_custom_ops.py::TestCustomOp::test_impl_device_function, test/test_custom_ops.py::TestCustomOp::test_impl_device_invalid, test/test_custom_ops.py::TestCustomOp::test_impl_function, test/test_custom_ops.py::TestCustomOp::test_impl_invalid_devices, test/test_custom_ops.py::TestCustomOp::test_impl_meta, test/test_custom_ops.py::TestCustomOp::test_impl_multiple, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CUDA, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_separate, test/test_custom_ops.py::TestCustomOp::test_incorrect_schema_types, test/test_custom_ops.py::TestCustomOp::test_infer_schema_no_return, test/test_custom_ops.py::TestCustomOp::test_infer_schema_supported, test/test_custom_ops.py::TestCustomOp::test_infer_schema_unsupported, test/test_custom_ops.py::TestCustomOp::test_invalid_qualname, test/test_custom_ops.py::TestCustomOp::test_invalid_schemas, test/test_custom_ops.py::TestCustomOp::test_is_functional_schema, test/test_custom_ops.py::TestCustomOp::test_is_tensorlist_like_type, test/test_custom_ops.py::TestCustomOp::test_legacy_define, test/test_custom_ops.py::TestCustomOp::test_legacy_impl, test/test_custom_ops.py::TestCustomOp::test_lifetime, test/test_custom_ops.py::TestCustomOp::test_load_library, test/test_custom_ops.py::TestCustomOp::test_meta_for_data_dependent_shape_operation, test/test_custom_ops.py::TestCustomOp::test_name_must_match, test/test_custom_ops.py::TestCustomOp::test_new_data_dependent_symint, test/test_custom_ops.py::TestCustomOp::test_not_implemented_error, test/test_custom_ops.py::TestCustomOp::test_override_cea, test/test_custom_ops.py::TestCustomOp::test_override_fake, test/test_custom_ops.py::TestCustomOp::test_override_impl, test/test_custom_ops.py::TestCustomOp::test_override_meta, test/test_custom_ops.py::TestCustomOp::test_private_ctor, test/test_custom_ops.py::TestCustomOp::test_reserved_ns, test/test_custom_ops.py::TestCustomOp::test_resolve_packet, test/test_custom_ops.py::TestCustomOp::test_save_for_backward_inputs_are_namedtuple, test/test_custom_ops.py::TestCustomOp::test_schema_matches_signature, test/test_custom_ops.py::TestCustomOp::test_sequences, test/test_custom_ops.py::TestCustomOp::test_supported_param_types, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_multi_return, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_single_return, test/test_custom_ops.py::TestCustomOp::test_supported_schemas, test/test_custom_ops.py::TestCustomOp::test_symints, test/test_custom_ops.py::TestCustomOp::test_unsupported_param_types, test/test_custom_ops.py::TestCustomOp::test_unsupported_schemas, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_inplace, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_dont_generate, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_inplace, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_inplace, test/test_custom_ops.py::MiniOpTest::test_mm, test/test_custom_ops.py::MiniOpTest::test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_schema__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_schema__test_inplace, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_schema__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_nonzero, test/test_custom_ops.py::TestCustomOpAPI::test_any_output_is_alias_to_input_or_output, test/test_custom_ops.py::TestCustomOpAPI::test_any_requires_grad, test/test_custom_ops.py::TestCustomOpAPI::test_basic, test/test_custom_ops.py::TestCustomOpAPI::test_compile, test/test_custom_ops.py::TestCustomOpAPI::test_default_values, test/test_custom_ops.py::TestCustomOpAPI::test_disallows_output_aliasing, test/test_custom_ops.py::TestCustomOpAPI::test_factory_function, test/test_custom_ops.py::TestCustomOpAPI::test_fake, test/test_custom_ops.py::TestCustomOpAPI::test_kwarg_only_tensors, test/test_custom_ops.py::TestCustomOpAPI::test_layout_constraint_tags, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_invalid, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_with_conditional_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_list_input, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times_different_devices, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_0, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_1, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_3, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_4, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_5, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_mode, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_subclass, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_library_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_op_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_schema_infer, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema_error, test/test_custom_ops.py::TestCustomOpAPI::test_multi_types, test/test_custom_ops.py::TestCustomOpAPI::test_mutated, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_error, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_unknown, test/test_custom_ops.py::TestCustomOpAPI::test_no_grad_skips_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_overloading, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_error_cases, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_replacement, test/test_custom_ops.py::TestCustomOpAPI::test_set_kernel_enabled, test/test_custom_ops.py::TestCustomOpAPI::test_split_device, test/test_custom_ops.py::TestCustomOpAPI::test_subclass_accessor_view, test/test_custom_ops.py::TestCustomOpAPI::test_subclass_accessor_view_error, test/test_custom_ops.py::TestCustomOpAPI::test_supports_tensorlist, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_dynamic__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_static__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_autograd_registration__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_faketensor__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTestOther::test_schema__test_nonzero_again, test/test_custom_ops.py::TestGenerateOpcheckTests::test_MiniOpTest, test/test_custom_ops.py::TestGenerateOpcheckTests::test_dont_generate_decorator, test/test_custom_ops.py::TestGenerateOpcheckTests::test_failures_dict_validation, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_no_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_is_inside_opcheck_mode, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_bad_op, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_customopdef, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_does_not_require_extra_deps, test/test_custom_ops.py::TestTypeConversion::test_mixed_types, test/test_custom_ops.py::TestTypeConversion::test_optional, test/test_custom_ops.py::TestTypeConversion::test_simple_tuple, test/test_custom_ops.py::TestTypeConversion::test_supported_types, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_custom_op, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_impl, test/test_custom_ops.py::TestOpProfiles::test_fake_registration, test/test_custom_ops.py::TestOpProfiles::test_save_to_file, test/test_custom_ops.py::TestOpProfiles::test_version, test/test_custom_ops.py::TestOpProfiles::test_yaml, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_assert_raises_regex_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registered_at_backend_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_autograd_kernel_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_compositeimplicitautograd_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_composite_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_global_state_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_view_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_functionalization_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_fails_basic_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCatCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCubeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulScalarCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNMSCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNonzeroCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySortCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyTakeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyViewCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_unbacked_stride_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_single_element_tuple_output_cuda 2025-12-04T15:11:31.7871366Z 2025-12-04T15:11:31.7871770Z Finished test_custom_ops 1/1 ... [2025-12-04 15:11:31.766701][21449.376607372], took 0.71min 2025-12-04T15:11:31.7958681Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.xml 2025-12-04T15:11:31.8810246Z Running inductor/test_analysis 1/1 ... [2025-12-04 15:11:31.880722][21449.490628642] 2025-12-04T15:11:31.8810877Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:11:31.8813784Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:31.881139] 2025-12-04T15:11:44.1135055Z 2025-12-04T15:11:44.1136046Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_a128307487ad43a3_.log 2025-12-04T15:11:44.1149835Z Running 28 items in this shard: test/inductor/test_analysis.py::TestUtils::test_tabulate2d, test/inductor/test_analysis.py::TestUtils::test_zip_dicts, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_helper_unit_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float64, test/inductor/test_analysis.py::TestAnalysisCUDA::test_noop_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float64 2025-12-04T15:11:44.1163166Z 2025-12-04T15:11:44.1163537Z Finished inductor/test_analysis 1/1 ... [2025-12-04 15:11:44.113307][21461.72321616], took 0.20min 2025-12-04T15:11:44.1422820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.xml 2025-12-04T15:11:44.2183817Z Running inductor/test_pad_mm 1/1 ... [2025-12-04 15:11:44.218067][21461.827973701] 2025-12-04T15:11:44.2184381Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:11:44.2187188Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pad_mm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:44.218473] 2025-12-04T15:11:54.2976612Z 2025-12-04T15:11:54.2977592Z inductor/test_pad_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_pad_mm_1.1_bfb512e8053e306d_.log 2025-12-04T15:11:54.2983636Z Running 19 items in this shard: test/inductor/test_pad_mm.py::PadMMTest::test_cat_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_cat_padding, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_padding, test/inductor/test_pad_mm.py::PadMMTest::test_no_autocast_in_pad_bmm_joint_graph_pass, test/inductor/test_pad_mm.py::PadMMTest::test_original_aten_preserved_pad_mm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_2d_bias, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_mn, test/inductor/test_pad_mm.py::PadMMTest::test_pad_batch, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_b, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_bm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_bf16, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_mnk, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_n, test/inductor/test_pad_mm.py::PadMMTest::test_pad_single_cat, test/inductor/test_pad_mm.py::PadMMTest::test_zero_dim 2025-12-04T15:11:54.2989286Z 2025-12-04T15:11:54.2989599Z Finished inductor/test_pad_mm 1/1 ... [2025-12-04 15:11:54.297439][21471.907348482], took 0.17min 2025-12-04T15:11:54.3264352Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.xml 2025-12-04T15:11:54.4044660Z Running inductor/test_triton_syntax 1/1 ... [2025-12-04 15:11:54.404069][21472.013976541] 2025-12-04T15:11:54.4045256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:11:54.4047844Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_syntax.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:54.404542] 2025-12-04T15:12:15.2499910Z 2025-12-04T15:12:15.2501555Z inductor/test_triton_syntax 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_syntax_1.1_cd6b570d7971cca9_.log 2025-12-04T15:12:15.2502928Z Running 1 items in this shard: test/inductor/test_triton_syntax.py::TestTritonSyntacticallyValid::test_triton_sqrt 2025-12-04T15:12:15.2503514Z 2025-12-04T15:12:15.2503890Z Finished inductor/test_triton_syntax 1/1 ... [2025-12-04 15:12:15.249753][21492.859660579], took 0.35min 2025-12-04T15:12:15.2792852Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.xml 2025-12-04T15:12:15.3582110Z Running inductor/test_triton_extension_backend 1/1 ... [2025-12-04 15:12:15.357834][21492.967741898] 2025-12-04T15:12:15.3582762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:12:15.3585405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_extension_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:15.358271] 2025-12-04T15:12:27.4791661Z 2025-12-04T15:12:27.4793048Z inductor/test_triton_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_extension_backend_1.1_e218feea67d6cd2a_.log 2025-12-04T15:12:27.4794134Z Running 0 items in this shard: 2025-12-04T15:12:27.4794360Z 2025-12-04T15:12:27.4794780Z Finished inductor/test_triton_extension_backend 1/1 ... [2025-12-04 15:12:27.478949][21505.088858698], took 0.20min 2025-12-04T15:12:27.5078133Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.xml 2025-12-04T15:12:27.5760378Z Running test_sparse_semi_structured 1/1 ... [2025-12-04 15:12:27.575708][21505.18561558] 2025-12-04T15:12:27.5760963Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:12:27.5764282Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_semi_structured.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:27.576140] 2025-12-04T15:12:37.7557454Z 2025-12-04T15:12:37.7558410Z test_sparse_semi_structured 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_semi_structured_1.1_4dd53f61ed651a5b_.log 2025-12-04T15:12:37.7580189Z Running 42 items in this shard: test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cusparselt, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cutlass, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_sp24_compile, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_indices, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_linear, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_min_sparse_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mlp, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_TN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_to_sparse_semi_structured, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dim, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_values, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_gemm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_edge_case1, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_id, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_meta_correctness, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_prune_dense_static_sort, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pruning_algo_largest_abs_values_greedy, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply_dense, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_bmm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_mat_vec, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions_all_patterns, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_linear_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_sparse_semi_structured_ops_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_compile_autotune, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_csrc_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cusparselt_backend, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_fp8fp8_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm_fp8 2025-12-04T15:12:37.7600688Z 2025-12-04T15:12:37.7601248Z Finished test_sparse_semi_structured 1/1 ... [2025-12-04 15:12:37.755555][21515.365463772], took 0.17min 2025-12-04T15:12:37.7848778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.xml 2025-12-04T15:12:37.8632297Z Running inductor/test_op_completeness 1/1 ... [2025-12-04 15:12:37.862908][21515.472815722] 2025-12-04T15:12:37.8632903Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:12:37.8636115Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_completeness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:37.863340] 2025-12-04T15:12:43.7362841Z 2025-12-04T15:12:43.7364684Z inductor/test_op_completeness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_completeness_1.1_5deb9907383c3460_.log 2025-12-04T15:12:43.7370864Z Running 5 items in this shard: test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_vec_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_halide_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_metal_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_triton_overrides 2025-12-04T15:12:43.7375507Z 2025-12-04T15:12:43.7376247Z Finished inductor/test_op_completeness 1/1 ... [2025-12-04 15:12:43.736053][21521.345961931], took 0.10min 2025-12-04T15:12:43.7662116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.xml 2025-12-04T15:12:43.7979965Z Running inductor/test_subgraph_choice 1/1 ... [2025-12-04 15:12:43.797628][21521.407536432] 2025-12-04T15:12:43.7980922Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:12:43.7984447Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_subgraph_choice.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:43.798090] 2025-12-04T15:13:02.9413754Z 2025-12-04T15:13:02.9415320Z inductor/test_subgraph_choice 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_subgraph_choice_1.1_927735b69ebf1973_.log 2025-12-04T15:13:02.9417345Z Running 2 items in this shard: test/inductor/test_subgraph_choice.py::TestSubgraphChoice::test_subgraph_decompose_k, test/inductor/test_subgraph_choice.py::TestSubgraphChoice::test_subgraph_freeze_layout 2025-12-04T15:13:02.9418388Z 2025-12-04T15:13:02.9418760Z Finished inductor/test_subgraph_choice 1/1 ... [2025-12-04 15:13:02.941149][21540.551056066], took 0.32min 2025-12-04T15:13:02.9709384Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.xml 2025-12-04T15:13:03.0640275Z Running inductor/test_cutedsl_grouped_mm 1/1 ... [2025-12-04 15:13:03.063641][21540.673547417] 2025-12-04T15:13:03.0640982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:13:03.0644182Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_grouped_mm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:03.064087] 2025-12-04T15:13:08.3364690Z 2025-12-04T15:13:08.3365964Z inductor/test_cutedsl_grouped_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_grouped_mm_1.1_4f25a6335f622148_.log 2025-12-04T15:13:08.3382049Z Running 24 items in this shard: test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_contiguous_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_contiguous_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_offset_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_offset_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_padded_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_padded_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_view_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_view_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_64_N_256 2025-12-04T15:13:08.3396925Z 2025-12-04T15:13:08.3397322Z Finished inductor/test_cutedsl_grouped_mm 1/1 ... [2025-12-04 15:13:08.336299][21545.946206704], took 0.09min 2025-12-04T15:13:08.3662274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.xml 2025-12-04T15:13:08.4032645Z Running inductor/test_cpp_wrapper_hipify 1/1 ... [2025-12-04 15:13:08.402922][21546.012828598] 2025-12-04T15:13:08.4033242Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:13:08.4036038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpp_wrapper_hipify.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:08.403335] 2025-12-04T15:13:14.7269544Z 2025-12-04T15:13:14.7270692Z inductor/test_cpp_wrapper_hipify 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpp_wrapper_hipify_1.1_353d02c262482f20_.log 2025-12-04T15:13:14.7273063Z Running 3 items in this shard: test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_aoti_driver_header, test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_basic_declaration, test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_cross_platform 2025-12-04T15:13:14.7274643Z 2025-12-04T15:13:14.7275028Z Finished inductor/test_cpp_wrapper_hipify 1/1 ... [2025-12-04 15:13:14.726724][21552.336633601], took 0.11min 2025-12-04T15:13:14.7565948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.xml 2025-12-04T15:13:14.8470771Z Running inductor/test_inductor_utils 1/1 ... [2025-12-04 15:13:14.846740][21552.456647425] 2025-12-04T15:13:14.8471369Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:13:14.8477202Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:14.847159] 2025-12-04T15:13:23.0734019Z 2025-12-04T15:13:23.0735174Z inductor/test_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_utils_1.1_67afa62609840b86_.log 2025-12-04T15:13:23.0736780Z Running 2 items in this shard: test/inductor/test_inductor_utils.py::TestBench::test_benchmarker, test/inductor/test_inductor_utils.py::TestBench::test_do_bench_using_profiling 2025-12-04T15:13:23.0737676Z 2025-12-04T15:13:23.0738053Z Finished inductor/test_inductor_utils 1/1 ... [2025-12-04 15:13:23.073169][21560.68307914], took 0.14min 2025-12-04T15:13:23.1029165Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.xml 2025-12-04T15:13:23.1849301Z Running inductor/test_template_heuristics_registry 1/1 ... [2025-12-04 15:13:23.184630][21560.79453802] 2025-12-04T15:13:23.1849972Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:13:23.1853381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_template_heuristics_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:23.185064] 2025-12-04T15:13:29.6586708Z 2025-12-04T15:13:29.6588165Z inductor/test_template_heuristics_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_template_heuristics_registry_1.1_3f598775c056439a_.log 2025-12-04T15:13:29.6592079Z Running 5 items in this shard: test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_assertion_existing_class, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_fallback_behavior, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_hierarchy_lookup, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_partial_hierarchy_scenarios, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_register_class 2025-12-04T15:13:29.6595121Z 2025-12-04T15:13:29.6595561Z Finished inductor/test_template_heuristics_registry 1/1 ... [2025-12-04 15:13:29.658452][21567.268361704], took 0.11min 2025-12-04T15:13:29.6882016Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.xml 2025-12-04T15:13:29.7660004Z Running inductor/test_async_compile 1/1 ... [2025-12-04 15:13:29.765646][21567.375552441] 2025-12-04T15:13:29.7660598Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:13:29.7663345Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_async_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:29.766075] 2025-12-04T15:14:46.0425306Z 2025-12-04T15:14:46.0426439Z inductor/test_async_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_async_compile_1.1_887cb91e60faea2f_.log 2025-12-04T15:14:46.0430639Z Running 8 items in this shard: test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_bad_kernel, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_wait_pool_ready 2025-12-04T15:14:46.0434448Z 2025-12-04T15:14:46.0434938Z Finished inductor/test_async_compile 1/1 ... [2025-12-04 15:14:46.042313][21643.652221601], took 1.27min 2025-12-04T15:14:46.0729963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.xml 2025-12-04T15:14:46.1575991Z Running dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 15:14:46.157250][21643.767157403] 2025-12-04T15:14:46.1576605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:14:46.1579623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_deque_reconstruct.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:14:46.157703] 2025-12-04T15:14:53.8833547Z 2025-12-04T15:14:53.8834650Z dynamo/test_deque_reconstruct 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_deque_reconstruct_1.1_f8b7d34594077ea6_.log 2025-12-04T15:14:53.8837061Z Running 3 items in this shard: test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_not_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_shallows_globals 2025-12-04T15:14:53.8838722Z 2025-12-04T15:14:53.8839341Z Finished dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 15:14:53.883136][21651.493046313], took 0.13min 2025-12-04T15:14:53.9134014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.xml 2025-12-04T15:14:53.9885084Z Running inductor/test_utils 1/1 ... [2025-12-04 15:14:53.988222][21651.598120994] 2025-12-04T15:14:53.9885611Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:14:53.9888774Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:14:53.988653] 2025-12-04T15:15:01.1133872Z 2025-12-04T15:15:01.1134847Z inductor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_utils_1.1_63e5e2174acc542d_.log 2025-12-04T15:15:01.1138059Z Running 7 items in this shard: test/inductor/test_utils.py::TestUtilsCUDA::testSympySubs_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_flops_fx_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_bfloat16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float32, test/inductor/test_utils.py::TestUtilsCUDA::test_sympy_str_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_zip_schema_cuda 2025-12-04T15:15:01.1140626Z 2025-12-04T15:15:01.1140952Z Finished inductor/test_utils 1/1 ... [2025-12-04 15:15:01.113183][21658.72309247], took 0.12min 2025-12-04T15:15:01.1437458Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.xml 2025-12-04T15:15:01.2442179Z Running inductor/test_indexing 1/1 ... [2025-12-04 15:15:01.243921][21658.853828365] 2025-12-04T15:15:01.2442771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:15:01.2445966Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:01.244361] 2025-12-04T15:15:20.7877527Z 2025-12-04T15:15:20.7878526Z inductor/test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_indexing_1.1_2bd025888cab1cf8_.log 2025-12-04T15:15:20.7888640Z Running 22 items in this shard: test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_applied, test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_skipped, test/inductor/test_indexing.py::TestIndexingSimplification::test_floordiv_div_sympy_is_integer_bug, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_join, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_simplification, test/inductor/test_indexing.py::TestIndexingSimplification::test_int8_unpack, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_not_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_positive, test/inductor/test_indexing.py::ExprPrinterTests::test_print_Min_Max, test/inductor/test_indexing.py::ExprPrinterTests::test_print_ceil, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor_div, test/inductor/test_indexing.py::ExprPrinterTests::test_print_integer, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod_index, test/inductor/test_indexing.py::ExprPrinterTests::test_print_pow, test/inductor/test_indexing.py::ExprPrinterTests::test_print_python_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_-1, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_0, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_1 2025-12-04T15:15:20.7897785Z 2025-12-04T15:15:20.7898124Z Finished inductor/test_indexing 1/1 ... [2025-12-04 15:15:20.787556][21678.397464601], took 0.33min 2025-12-04T15:15:20.8184358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.xml 2025-12-04T15:15:20.9661624Z Running inductor/test_inductor_annotations 1/1 ... [2025-12-04 15:15:20.965836][21678.575743171] 2025-12-04T15:15:20.9662256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:15:20.9665307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:20.966276] 2025-12-04T15:15:39.5085720Z 2025-12-04T15:15:39.5086840Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_e129b89bdd73962f_.log 2025-12-04T15:15:39.5088833Z Running 2 items in this shard: test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_no_annotations, test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_training_annotation 2025-12-04T15:15:39.5089975Z 2025-12-04T15:15:39.5090388Z Finished inductor/test_inductor_annotations 1/1 ... [2025-12-04 15:15:39.508348][21697.118257512], took 0.31min 2025-12-04T15:15:39.5389634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.xml 2025-12-04T15:15:39.6196622Z Running inductor/test_compile_worker 1/1 ... [2025-12-04 15:15:39.619354][21697.229261204] 2025-12-04T15:15:39.6197237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:15:39.6200316Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:39.619777] 2025-12-04T15:17:11.8158341Z 2025-12-04T15:17:11.8159689Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_00f9da717f84f877_.log 2025-12-04T15:17:11.8166704Z Running 16 items in this shard: test/inductor/test_compile_worker.py::TestCompileWorker::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorker::test_crash, test/inductor/test_compile_worker.py::TestCompileWorker::test_exception, test/inductor/test_compile_worker.py::TestCompileWorker::test_logging, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_crash, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_exception, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_logging, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestTimer::test_basics, test/inductor/test_compile_worker.py::TestTimer::test_never_fires, test/inductor/test_compile_worker.py::TestTimer::test_repeated_calls, test/inductor/test_compile_worker.py::TestTimer::test_spammy_calls 2025-12-04T15:17:11.8173089Z 2025-12-04T15:17:11.8173558Z Finished inductor/test_compile_worker 1/1 ... [2025-12-04 15:17:11.815624][21789.425533212], took 1.54min 2025-12-04T15:17:11.8469400Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.xml 2025-12-04T15:17:11.9207547Z Running dynamo/test_einops 1/1 ... [2025-12-04 15:17:11.920423][21789.530329785] 2025-12-04T15:17:11.9208115Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:17:11.9211208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_einops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:11.920856] 2025-12-04T15:17:16.7922053Z 2025-12-04T15:17:16.7923048Z dynamo/test_einops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_einops_1.1_fa1def1006f21bae_.log 2025-12-04T15:17:16.7924924Z Running 3 items in this shard: test/dynamo/test_einops.py::TestEinops::test_functions_version_none, test/dynamo/test_einops.py::TestEinops::test_layers_version_none, test/dynamo/test_einops.py::TestEinops::test_no_recompile_on_lazy_state_version_none 2025-12-04T15:17:16.7926232Z 2025-12-04T15:17:16.7926545Z Finished dynamo/test_einops 1/1 ... [2025-12-04 15:17:16.791967][21794.401876585], took 0.08min 2025-12-04T15:17:16.8229093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.xml 2025-12-04T15:17:16.8606326Z Running inductor/test_external_callables 1/1 ... [2025-12-04 15:17:16.860327][21794.470233362] 2025-12-04T15:17:16.8606938Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:17:16.8610381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_external_callables.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:16.860769] 2025-12-04T15:17:41.7124499Z 2025-12-04T15:17:41.7125623Z inductor/test_external_callables 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_external_callables_1.1_532bdcfa274f54bc_.log 2025-12-04T15:17:41.7128798Z Running 3 items in this shard: test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cpu, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cuda, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_dup 2025-12-04T15:17:41.7130413Z 2025-12-04T15:17:41.7130825Z Finished inductor/test_external_callables 1/1 ... [2025-12-04 15:17:41.712242][21819.322151085], took 0.41min 2025-12-04T15:17:41.7436169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.xml 2025-12-04T15:17:41.8302778Z Running test_testing 1/1 ... [2025-12-04 15:17:41.829955][21819.439862858] 2025-12-04T15:17:41.8303297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:17:41.8306228Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_testing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:41.830381] 2025-12-04T15:18:51.0526727Z 2025-12-04T15:18:51.0528152Z test_testing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_testing_1.1_a28c99e40f247370_.log 2025-12-04T15:18:51.1800073Z Running 2074 items in this shard: test/test_testing.py::TestTestingCUDA::test_assertEqual_longMessage_cuda, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_bool, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int8, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_utils_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_get_supported_dtypes_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_bool, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_bool_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_equality_shortcut_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float64, test/test_testing.py::TestTestingCUDA::test_setup_and_teardown_run_for_device_specific_tests_cuda, test/test_testing.py::TestTestingCUDA::test_supported_dtypes_abs_cuda, test/test_testing.py::TestFrameworkUtils::test_filtering_env_var, test/test_testing.py::TestAssertClose::test_bool, test/test_testing.py::TestAssertClose::test_default_tolerance_selection_mismatching_dtypes, test/test_testing.py::TestAssertClose::test_docstring_examples, test/test_testing.py::TestAssertClose::test_matching, test/test_testing.py::TestAssertClose::test_matching_atol, test/test_testing.py::TestAssertClose::test_matching_conjugate_bit, test/test_testing.py::TestAssertClose::test_matching_nan, test/test_testing.py::TestAssertClose::test_matching_nan_with_equal_nan, test/test_testing.py::TestAssertClose::test_matching_rtol, test/test_testing.py::TestAssertClose::test_meta, test/test_testing.py::TestAssertClose::test_mismatching_dtype, test/test_testing.py::TestAssertClose::test_mismatching_dtype_no_check, test/test_testing.py::TestAssertClose::test_mismatching_layout, test/test_testing.py::TestAssertClose::test_mismatching_layout_no_check, test/test_testing.py::TestAssertClose::test_mismatching_shape, test/test_testing.py::TestAssertClose::test_mismatching_stride, test/test_testing.py::TestAssertClose::test_mismatching_stride_no_check, test/test_testing.py::TestAssertClose::test_mismatching_types, test/test_testing.py::TestAssertClose::test_mismatching_types_subclasses, test/test_testing.py::TestAssertClose::test_mismatching_types_type_equality, test/test_testing.py::TestAssertClose::test_mismatching_values, test/test_testing.py::TestAssertClose::test_mismatching_values_atol, test/test_testing.py::TestAssertClose::test_mismatching_values_rtol, test/test_testing.py::TestAssertClose::test_none, test/test_testing.py::TestAssertClose::test_none_mismatch, test/test_testing.py::TestAssertClose::test_numpy, test/test_testing.py::TestAssertClose::test_only_atol, test/test_testing.py::TestAssertClose::test_only_rtol, test/test_testing.py::TestAssertClose::test_scalar, test/test_testing.py::TestAssertClose::test_unexpected_error_compare, test/test_testing.py::TestAssertClose::test_unexpected_error_originate, test/test_testing.py::TestAssertClose::test_unknown_layout, test/test_testing.py::TestAssertClose::test_unknown_type, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_cuda, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_no_check_cuda, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_atol, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_scalars, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_tensor_likes, test/test_testing.py::TestAssertCloseErrorMessage::test_mismatched_elements, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_callable, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_str, test/test_testing.py::TestAssertCloseErrorMessage::test_not_close, test/test_testing.py::TestAssertCloseErrorMessage::test_not_equal, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_rtol, test/test_testing.py::TestAssertCloseErrorMessage::test_small_float_dtype, test/test_testing.py::TestAssertCloseErrorMessage::test_zero_div_zero, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_keys, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_values_msg, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_len, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_coalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_uncoalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_indices_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_nnz, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_sparse_dims, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_matching, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_matching, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_matching, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_matching, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_channel, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_tensor, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_is_quantized, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_qscheme, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_uint8, test/test_testing.py::TestTestParametrization::test_apply_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_compose_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_default_names, test/test_testing.py::TestTestParametrization::test_modules_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_multiple_handling_of_same_param_error, test/test_testing.py::TestTestParametrization::test_name_fn, test/test_testing.py::TestTestParametrization::test_ops_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_reparametrize, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_1, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_2, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_3, test/test_testing.py::TestTestParametrization::test_subtest_names, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_6, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_name_non_primitive_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_invalid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_valid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_list_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_decorator_applies_module_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_multiple_handling_of_same_param_error_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_name_fn_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_decorator_applies_op_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_param_specific_decoration_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_1_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_2_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_3_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_unparametrized_names_cuda, test/test_testing.py::TestImports::test_circular_dependencies, test/test_testing.py::TestImports::test_lazy_imports_are_lazy, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_functorch, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_torch, test/test_testing.py::TestImports::test_no_warning_on_import, test/test_testing.py::TestImports::test_not_import_sympy, test/test_testing.py::TestOpInfos::test_sample_input, test/test_testing.py::TestOpInfos::test_sample_input_metadata, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_T_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___radd___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rand___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rdiv___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmod___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmul___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___ror___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rpow___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rsub___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rxor___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators__chunk_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_aminmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_arange_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_as_strided_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_atan2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bernoulli_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_left_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_right_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bucketize_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cauchy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_max_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_min_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_complex_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_copysign_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cov_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_embed_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diff_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_floor_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_no_rounding_mode_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_trunc_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_empty_permuted_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eye_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fliplr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_flipud_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_float_power_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_floor_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmod_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gather_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gcd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ge_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_geometric_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gradient_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_heaviside_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_histogramdd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hypot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igamma_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igammac_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_isclose_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_item_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_kthvalue_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lcm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ldexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_le_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_cross_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_log_normal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logaddexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logcumsumexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_fill_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_max_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_maximum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mean_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_median_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_min_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_minimum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_movedim_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mul_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_multinomial_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_native_layer_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ne_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_neg_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nextafter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_embedding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_group_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hardtanh_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_huber_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_l1_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multi_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multilabel_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_prelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rms_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rrelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_softshrink_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_normal_in_place_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ormqr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_polar_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_pow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_remainder_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_renorm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_roll_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rot90_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rsub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_bartlett_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_blackman_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_gaussian_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hann_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_kaiser_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_nuttall_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_h_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_he_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_laguerre_polynomial_l_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_legendre_polynomial_p_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_xlog1py_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_zeta_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sum_to_size_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_take_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_trace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_tril_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_triu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_true_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_uniform_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vdot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_where_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_xlogy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_H_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_T_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___getitem___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmatmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__batch_norm_with_update_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__chunk_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_lengths_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_offsets_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__softmax_backward_data_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_put_accumulate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__upsample_bilinear2d_aa_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_decomposed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_alias_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_all_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_allclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_aminmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_any_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_arange_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argsort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argwhere_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_partial_views_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_baddbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bernoulli_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bincount_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_block_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_shapes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cartesian_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cauchy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_inverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_column_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_combinations_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_constant_pad_nd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_corrcoef_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_count_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cov_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagflat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diff_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_einsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_permuted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_equal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eye_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flip_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fliplr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flipud_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gather_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geometric_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geqrf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gradient_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hash_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_histc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_inner_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_istft_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_item_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kron_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kthvalue_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lerp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cond_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_det_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eig_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_householder_product_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_multi_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_slogdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svdvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vander_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vecdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vector_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logcumsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_unpack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mH_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mT_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matrix_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_msort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_multinomial_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmedian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanquantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nansum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_dropout_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_channel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_glu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_head_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_negative_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rms_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_static_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_fro_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_inf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_nuc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_in_place_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_number_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ormqr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_outer_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pca_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pinverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_quantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rand_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ravel_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_renorm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_interleave_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize_as__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_roll_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rot90_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scalar_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_searchsorted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_mm_reduce_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_list_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_multiple_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_to_size_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_along_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensor_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensordot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_sparse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_topk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapz_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triangular_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unflatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_uniform_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_consecutive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unravel_index_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_real_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zero__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_like_cuda_float32 2025-12-04T15:18:51.2824883Z 2025-12-04T15:18:51.2825199Z Finished test_testing 1/1 ... [2025-12-04 15:18:51.056592][21888.666494533], took 1.15min 2025-12-04T15:18:51.2826506Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.xml 2025-12-04T15:18:51.2827654Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 15:18:51.230380][21888.840286167] 2025-12-04T15:18:51.2828222Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:18:51.2829444Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:51.230832] 2025-12-04T15:18:59.7571596Z 2025-12-04T15:18:59.7572978Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_7c7f9dd585a9f6c9_.log 2025-12-04T15:18:59.7574342Z Running 1 items in this shard: test/dynamo/test_fx_passes_pre_grad.py::FxPassesPreGradTests::test_pass_execution_and_save 2025-12-04T15:18:59.7574941Z 2025-12-04T15:18:59.7575311Z Finished dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 15:18:59.756946][21897.366856169], took 0.14min 2025-12-04T15:18:59.7879994Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.xml 2025-12-04T15:18:59.8651003Z Running export/test_strict_export_v2 1/1 ... [2025-12-04 15:18:59.864753][21897.474659718] 2025-12-04T15:18:59.8651617Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:18:59.8654279Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_strict_export_v2.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:59.865154] 2025-12-04T15:21:08.5161783Z 2025-12-04T15:21:08.5162852Z export/test_strict_export_v2 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_strict_export_v2_1.1_3c4ed2fe1af04b4b_.log 2025-12-04T15:21:08.5409617Z Running 440 items in this shard: test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_assume_static_by_default_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_constraints_error_not_in_range_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_constraints_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_inline_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_slice_maxsize_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_slice_unbacked_dim1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_strict_narrow_unbacked_expr_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_no_grad_param_inplace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_reshape_view_backed_size_oblivious_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test__scaled_dot_product_flash_attention_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_additional_inputs_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_allow_explicit_guards_as_runtime_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_annotate_on_assert_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_args_type_checked_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_aten_lift_fresh_copy_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_attention_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_attr_assignment_extra_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_constrain_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_constant_relation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_linear_relation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_simple_equality_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_baddbmm_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_non_strict_fake_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_non_strict_real_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_bincount_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_buffer_util_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_constructor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_constructor_torch_ir_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_wrong_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_ccode_python_mod_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cdist_forward_compute_mode_zero_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_check_specialized_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_checks_to_constrain_range_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cleanup_dynamic_markers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_colin_unbacked_backed_vr_sub_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_colon_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_compiling_state_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_access_identical_symint_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_branches_return_constant_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_branches_return_same_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_contains_unbacked_no_escape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_int_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_with_module_stack_export_with_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_with_module_stack_export_with_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_aliasing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_input_naming_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_no_user_inp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_output_dup_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_requires_grad_const_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_return_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_with_non_functional_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_with_non_functional_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_in_eager_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_with_constrain_value_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_with_various_cases_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_conv_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_crop_like_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cse_for_symint_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_functionalize_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_functionalize_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_warn_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_preserve_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_tag_metadata_re_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_batch_norm_functional_predispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_item_in_prim_after_decomposition_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_item_in_prim_before_decomposition_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_default_decomposition_core_cia_ops_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_1_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_integer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_repeat_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_simplified_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_repeat_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_nonstrict_with_stacktrace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_strict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_gpu_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_mutation_float_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_static_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_1_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_auto_and_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_divisibility_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_hint_range_violations_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_hint_ranges_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_disable_forced_specializations_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_disable_forced_specializations_ok_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_gather_into_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_gather_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_reduce_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_to_all_single_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_reduce_scatter_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dont_duck_size_for_auto_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_double_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_aliasing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_list_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_with_nan_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_fake_kernel_inference_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_infers_fake_kernel_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_duplicate_modules_with_non_persistent_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_lr_shift_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_bounds_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_dataclass_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_inferred_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_generic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_user_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_various_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_spec_with_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_wrapped_with_shape_guards_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_sym_round_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_ends_of_bounds_oblivious_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_enum_str_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_error_does_not_reference_eager_fallback_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_error_when_passing_mutating_primitive_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_exception_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_expand_copy_export_handles_implicit_true_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_api_with_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_as_backend_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_lifted_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_symbol_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_symbol_scandim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_subclass_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_preserve_torch_fn_for_subgraphs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_symbool_pred_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_warns_constant_pred_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_decomp_table_basic_pop_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_decomp_table_container_methods_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_op_lib_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_triton_kernel_mutable_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_triton_kernel_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cyclic_reference_leak_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomp_torture_case_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomp_torture_case_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomps_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomps_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_dynamo_config_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_run_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_container_type_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_state_dict_hooks_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_default_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_keyword_only_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_pytree_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_keyword_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_keyword_pytree_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_postional_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_function_schema_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_graph_with_no_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_bug_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_dynamic_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_static_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_leak_compile_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_linear_preserve_dynamic_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_max_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_max_onnx_reported_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_mod_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_preserve_linear_at_aot_level_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_preserve_linear_but_not_custom_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_rnn_variants_with_warning_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_scan_pytree_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_script_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_statically_known_true_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_then_compile_tensor_ctor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_autocast_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_fake_tensor_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_inline_constraints_complex_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_inline_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_set_grad_enabled_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_wrong_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_external_call_non_strict_real_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fake_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fake_weights_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_filter_traceback_frames_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_flex_attention_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_float_conversion_from_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_float_conversion_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fqn_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_from_node_metadata_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_full_on_scalar_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_function_holding_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_hints_wrapper_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_hoo_inline_users_issue_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_if_functional_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_if_post_autograd_op_preserved_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inductor_backend_inside_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_class_method_recursive_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_class_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_int_shape_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_intermediate_shape_comp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_invalid_pytree_dynamo_graph_capture_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_is_exporting_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_is_nonzero_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_isnonzero_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_113041_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_157289_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_161902_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_istft_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_invalid_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_linear_convd_for_training_ir_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_linear_convd_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_kwarg_dynamic_shapes_diff_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_kwargs_reorder_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_layer_norm_unbacked_normalized_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_layer_sharing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_lazy_module_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_linear_conv_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_malformed_fqn_from_source_name_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_map_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_map_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mask_nonzero_static_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_masked_select_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_math_pow_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mismatched_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mixed_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_dict_key_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_input_subclasses_parameterization_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_list_slice_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_with_dict_container_inp_out_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_modules_access_for_deleted_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_more_multidimensional_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multidimensional_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multinomial_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multiple_definitions_same_name_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_namedtuple_input_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_native_multi_attention_head_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_dynamic_shapes_spec_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_fake_tensor_leak_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_constant_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_init_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nn_module_stack_shared_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_check_is_size_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_suggested_fixes_for_data_dependent_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_3_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_persistent_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_strict_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_strict_dynamic_shapes_suggested_fixes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_none_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonstrict_retrace_preserves_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonzero_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonzero_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_not_registered_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_operator_aten_tensor_mode_variant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_output_node_name_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pad_sequence_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_param_util_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_partial_patched_forward_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_collisions_hoo_subgraphs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_collisions_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_order_variadic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_update_preserving_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_predispatch_cond_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_predispatch_grad_wrappers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_annotation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_module_call_signature_unflatten_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_requires_grad_placeholders_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_shape_dynamism_for_unused_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_profiling_code_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_python_asserts_with_sym_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pytree_register_data_class_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pytree_register_nested_data_class_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_raise_user_error_when_guard_on_data_dependent_operation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_range_constraints_with_replacement_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_alias_dtype_mismatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_bool_cast_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_errors_on_aliasing_custom_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_for_max_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_size_mismatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_redundant_assert_max_upper_bound_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_redundant_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_refine_dynamic_shapes_from_suggested_fixes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_register_constant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_repeat_interleave_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_replace_unbacked_with_very_large_upperbound_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_replaced_unbacked_bindings_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_reshape_view_helper_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_retracable_ep_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_retrace_pre_autograd_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decomposition_supports_user_input_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decompositions_keep_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decompositions_keep_tensor_constant_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_for_prim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_for_prm_str_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_with_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sdpa_gqa_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sequential_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_example_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_as_side_effect_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_empty_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_setgrad_lifted_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_shared_submodule_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_simple_export_for_training_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_simple_unbacked_view_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_size_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_slice_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_solver_unsupported_sympy_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_specialize_derived_dim_roots_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_split_const_gm_with_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_stack_trace_make_fx_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_stack_trace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_primitives_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_shape_attribute_assignment_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_tensors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_static_dim_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_context_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_complicated_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_const_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclasses_parameterization_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclasses_parameterization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggest_torch_checks_with_non_negative_check_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggest_torch_checks_with_regular_check_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_for_data_dependent_errors_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_new_roots_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_float_operators_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_or_sym_and_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_sqrt_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symbool_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symfloat_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_additional_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_ranges_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_shapes_collection_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_tensor_return_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tag_ac_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_attribute_zero_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_constant_aten_to_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_constant_with_wrapped_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_multiple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tolist_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_torch_check_eq_commutativity_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_torch_fn_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_trace_under_fake_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_train_eval_on_exported_preautograd_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tril_dynamic_diagonal_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_triu_dynamic_diagonal_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_3d_matmul_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_bincount_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_bindings_for_divisible_u_symint_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_deferred_runtime_retrace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_expand_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_infer_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_kth_value_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_linear_layer_norm_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_noncontig_lin_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_pad_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_scalar_constructor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_slice_forward_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_slice_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_to_cond_passthrough_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_to_cond_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_unsqueeze_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_buffer_update_child2parent_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_isinstance_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_shared_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_state_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_no_unroll_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_placeholder_update_child2parent_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_5_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_6_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_buf_8_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_const_preserving_3_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_const_preserving_3_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_6_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_9_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_10_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_5_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_7_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_preserving_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unused_aliases_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unused_constant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_uplift_common_custom_meta_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_uplift_common_custom_meta_with_multiple_calls_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_use_embedding_twice_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_user_input_and_buffer_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_custom_autograd_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_to_assert_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_where_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_assert_separation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_index_assertions_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_tensor_constant_idx_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_wrapper_module_strict_export_v2 2025-12-04T15:21:08.5650822Z 2025-12-04T15:21:08.5651234Z Finished export/test_strict_export_v2 1/1 ... [2025-12-04 15:21:08.517457][22026.127362825], took 2.14min 2025-12-04T15:21:08.5652555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.xml 2025-12-04T15:21:08.6396238Z Running export/test_functionalized_assertions 1/1 ... [2025-12-04 15:21:08.639266][22026.249172503] 2025-12-04T15:21:08.6397203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:08.6399665Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_functionalized_assertions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:08.639691] 2025-12-04T15:21:13.9116116Z 2025-12-04T15:21:13.9117373Z export/test_functionalized_assertions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_functionalized_assertions_1.1_7d17ab73392af6b4_.log 2025-12-04T15:21:13.9119735Z Running 2 items in this shard: test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_assert_async_msg, test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_sym_constrain_range 2025-12-04T15:21:13.9121180Z 2025-12-04T15:21:13.9121597Z Finished export/test_functionalized_assertions 1/1 ... [2025-12-04 15:21:13.911393][22031.521302758], took 0.09min 2025-12-04T15:21:13.9429390Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.xml 2025-12-04T15:21:13.9718910Z Running inductor/test_selective_lowering 1/1 ... [2025-12-04 15:21:13.971641][22031.581548786] 2025-12-04T15:21:13.9719496Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:13.9723161Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_selective_lowering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:13.972029] 2025-12-04T15:21:32.7157630Z 2025-12-04T15:21:32.7158716Z inductor/test_selective_lowering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_selective_lowering_1.1_e1c78d2a5185c394_.log 2025-12-04T15:21:32.7160643Z Running 2 items in this shard: test/inductor/test_selective_lowering.py::SelectiveLoweringTest::test_basic_selective_lowering, test/inductor/test_selective_lowering.py::SelectiveLoweringTest::test_no_fallback_when_unmarked 2025-12-04T15:21:32.7161798Z 2025-12-04T15:21:32.7162244Z Finished inductor/test_selective_lowering 1/1 ... [2025-12-04 15:21:32.715524][22050.325432969], took 0.31min 2025-12-04T15:21:32.7470654Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.xml 2025-12-04T15:21:32.8210165Z Running dynamo/test_base_output 1/1 ... [2025-12-04 15:21:32.820702][22050.430609748] 2025-12-04T15:21:32.8210766Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:32.8213641Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:32.821117] 2025-12-04T15:21:38.1431060Z 2025-12-04T15:21:38.1431978Z dynamo/test_base_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_output_1.1_c6d6552f20e02364_.log 2025-12-04T15:21:38.1434591Z Running 6 items in this shard: test/dynamo/test_base_output.py::TestBaseOutput::test_assign, test/dynamo/test_base_output.py::TestBaseOutput::test_create, test/dynamo/test_base_output.py::TestBaseOutput::test_getattr, test/dynamo/test_base_output.py::TestBaseOutput::test_getitem, test/dynamo/test_base_output.py::TestBaseOutput::test_index, test/dynamo/test_base_output.py::TestBaseOutput::test_tuple 2025-12-04T15:21:38.1436540Z 2025-12-04T15:21:38.1436881Z Finished dynamo/test_base_output 1/1 ... [2025-12-04 15:21:38.142918][22055.752827855], took 0.09min 2025-12-04T15:21:38.1746563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.xml 2025-12-04T15:21:38.2086013Z Running inductor/test_lookup_table 1/1 ... [2025-12-04 15:21:38.208356][22055.818263765] 2025-12-04T15:21:38.2086588Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:38.2090027Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_lookup_table.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:38.208744] 2025-12-04T15:21:47.8226363Z 2025-12-04T15:21:47.8227542Z inductor/test_lookup_table 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_lookup_table_1.1_47a98ebb9baf620f_.log 2025-12-04T15:21:47.8228368Z 2025-12-04T15:21:47.8228741Z Finished inductor/test_lookup_table 1/1 ... [2025-12-04 15:21:47.822403][22065.432312847], took 0.16min 2025-12-04T15:21:47.8548384Z Running export/test_serialize 1/1 ... [2025-12-04 15:21:47.854555][22065.464462681] 2025-12-04T15:21:47.8548955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:21:47.8552036Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serialize.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:47.854954] 2025-12-04T15:22:25.7739317Z 2025-12-04T15:22:25.7740551Z export/test_serialize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serialize_1.1_aebb5c7eea9352a2_.log 2025-12-04T15:22:25.7787042Z Running 116 items in this shard: test/export/test_serialize.py::TestSerialize::test_1D_tensor_slicing, test/export/test_serialize.py::TestSerialize::test_2D_tensor_slicing, test/export/test_serialize.py::TestSerialize::test_canonicalize, test/export/test_serialize.py::TestSerialize::test_complex_constant, test/export/test_serialize.py::TestSerialize::test_empty_constant, test/export/test_serialize.py::TestSerialize::test_empty_state_dict, test/export/test_serialize.py::TestSerialize::test_export_example_inputs_preserved, test/export/test_serialize.py::TestSerialize::test_export_with_extension_op_serialization, test/export/test_serialize.py::TestSerialize::test_int_list, test/export/test_serialize.py::TestSerialize::test_kwargs_default, test/export/test_serialize.py::TestSerialize::test_metadata_parsing_with_layer_split, test/export/test_serialize.py::TestSerialize::test_metadata_run_decomp_serder, test/export/test_serialize.py::TestSerialize::test_multi_return_some_unused, test/export/test_serialize.py::TestSerialize::test_nested_layer_split, test/export/test_serialize.py::TestSerialize::test_non_float_weight, test/export/test_serialize.py::TestSerialize::test_nonfinite_inputs, test/export/test_serialize.py::TestSerialize::test_predispatch_export_with_autograd_op, test/export/test_serialize.py::TestSerialize::test_preserve_aliasing, test/export/test_serialize.py::TestSerialize::test_rational_ranges, test/export/test_serialize.py::TestSerialize::test_serialize_constant_outputs, test/export/test_serialize.py::TestSerialize::test_serialize_infinite_sym_int, test/export/test_serialize.py::TestSerialize::test_serialize_list_returns, test/export/test_serialize.py::TestSerialize::test_serialize_multiple_returns_from_node, test/export/test_serialize.py::TestSerialize::test_serialize_param_mutation, test/export/test_serialize.py::TestSerialize::test_serialize_sym_float, test/export/test_serialize.py::TestSerialize::test_serialize_sym_int, test/export/test_serialize.py::TestSerialize::test_storage_offset, test/export/test_serialize.py::TestSerialize::test_symint_list, test/export/test_serialize.py::TestSerialize::test_triton_hop, test/export/test_serialize.py::TestSerialize::test_weight_sharing_gpu, test/export/test_serialize.py::TestDeserialize::test_arg_from, test/export/test_serialize.py::TestDeserialize::test_auto_functionalize, test/export/test_serialize.py::TestDeserialize::test_basic, test/export/test_serialize.py::TestDeserialize::test_cond, test/export/test_serialize.py::TestDeserialize::test_constraints, test/export/test_serialize.py::TestDeserialize::test_custom_obj, test/export/test_serialize.py::TestDeserialize::test_custom_obj_list_out, test/export/test_serialize.py::TestDeserialize::test_custom_obj_tuple_out, test/export/test_serialize.py::TestDeserialize::test_device, test/export/test_serialize.py::TestDeserialize::test_dynamic, test/export/test_serialize.py::TestDeserialize::test_export_no_inputs, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_assume_constant_result, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_autograd_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_class_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_class_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_nested_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_nonlocal_variables, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_closed_over_variable, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_operands, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_predicate, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_constrain_as_size_example, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_constrain_as_value_example, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_decorator, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dictionary, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_assert, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_constructor, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_if_guard, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_map, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_slicing, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_view, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_fn_with_kwargs, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_list_contains, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_list_unpack, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_model_attr_mutation, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_nested_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_null_context_manager, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_optional_input, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_pytree_flatten, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_scalar_output, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_specialized_attribute, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_static_for_loop, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_static_if, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_tensor_setattr, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_type_reflection_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_user_input_mutation, test/export/test_serialize.py::TestDeserialize::test_forward_compatibility, test/export/test_serialize.py::TestDeserialize::test_get_attr, test/export/test_serialize.py::TestDeserialize::test_get_attr_list, test/export/test_serialize.py::TestDeserialize::test_hoo_symint_input, test/export/test_serialize.py::TestDeserialize::test_list_of_optional_tensors, test/export/test_serialize.py::TestDeserialize::test_map, test/export/test_serialize.py::TestDeserialize::test_module, test/export/test_serialize.py::TestDeserialize::test_module_meta, test/export/test_serialize.py::TestDeserialize::test_multi_return, test/export/test_serialize.py::TestDeserialize::test_multiple_getitem, test/export/test_serialize.py::TestDeserialize::test_none_input, test/export/test_serialize.py::TestDeserialize::test_optional_tuple, test/export/test_serialize.py::TestDeserialize::test_positional_argument_with_default_value, test/export/test_serialize.py::TestDeserialize::test_pytree_namedtuple, test/export/test_serialize.py::TestDeserialize::test_serialize_float8, test/export/test_serialize.py::TestDeserialize::test_shape, test/export/test_serialize.py::TestDeserialize::test_sym_bool, test/export/test_serialize.py::TestDeserialize::test_sym_bool_dynamic_shapes, test/export/test_serialize.py::TestDeserialize::test_sym_bool_torch_check_equal, test/export/test_serialize.py::TestDeserialize::test_sym_float, test/export/test_serialize.py::TestDeserialize::test_sym_int_torch_check_equal, test/export/test_serialize.py::TestDeserialize::test_sym_ite, test/export/test_serialize.py::TestDeserialize::test_tensor_tensor_list, test/export/test_serialize.py::TestDeserialize::test_unbacked_bindings_serialize, test/export/test_serialize.py::TestSchemaVersioning::test_error, test/export/test_serialize.py::TestSaveLoad::test_deserialize_torch_artifact_dict, test/export/test_serialize.py::TestSaveLoad::test_save_buffer, test/export/test_serialize.py::TestSaveLoad::test_save_constants, test/export/test_serialize.py::TestSaveLoad::test_save_extra, test/export/test_serialize.py::TestSaveLoad::test_save_file, test/export/test_serialize.py::TestSaveLoad::test_save_load_with_multiple_empty_tensors, test/export/test_serialize.py::TestSaveLoad::test_save_path, test/export/test_serialize.py::TestSaveLoad::test_version_error, test/export/test_serialize.py::TestSerializeCustomClass::test_backed_size_oblivious_serdes, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class_containing_fake_tensor, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class_input_to_function, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_copy, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_decomp, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_serialization, test/export/test_serialize.py::TestSerializeCustomClass::test_unbacked_range_serdes 2025-12-04T15:22:25.7831962Z 2025-12-04T15:22:25.7832296Z Finished export/test_serialize 1/1 ... [2025-12-04 15:22:25.773954][22103.383860351], took 0.63min 2025-12-04T15:22:25.8067009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.xml 2025-12-04T15:22:25.8981494Z Running inductor/test_move_constructors_to_gpu 1/1 ... [2025-12-04 15:22:25.897782][22103.507689601] 2025-12-04T15:22:25.8982136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:22:25.8985202Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_move_constructors_to_gpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:25.898229] 2025-12-04T15:22:49.0979215Z 2025-12-04T15:22:49.0980415Z inductor/test_move_constructors_to_gpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_move_constructors_to_gpu_1.1_3373ad77744fe6e4_.log 2025-12-04T15:22:49.0984899Z Running 7 items in this shard: test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_multi_gpu, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_multiple_constructors, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_no_gpu, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_non_convertable_op_failure, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_output_failure, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_sets_equiv, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_simple 2025-12-04T15:22:49.0988477Z 2025-12-04T15:22:49.0988889Z Finished inductor/test_move_constructors_to_gpu 1/1 ... [2025-12-04 15:22:49.097676][22126.707586095], took 0.39min 2025-12-04T15:22:49.1298668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.xml 2025-12-04T15:22:49.2088539Z Running inductor/test_remote_cache 1/1 ... [2025-12-04 15:22:49.208540][22126.818447548] 2025-12-04T15:22:49.2089122Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:22:49.2092121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_remote_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:49.208976] 2025-12-04T15:22:54.5310272Z 2025-12-04T15:22:54.5311323Z inductor/test_remote_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_remote_cache_1.1_46ddba7c7bb0dd06_.log 2025-12-04T15:22:54.5313568Z Running 3 items in this shard: test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_logging, test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_no_sample, test/inductor/test_remote_cache.py::TestRemoteCache::test_normal_logging 2025-12-04T15:22:54.5314851Z 2025-12-04T15:22:54.5315211Z Finished inductor/test_remote_cache 1/1 ... [2025-12-04 15:22:54.530810][22132.140720712], took 0.09min 2025-12-04T15:22:54.5633429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.xml 2025-12-04T15:22:54.5928361Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:22:54.592564][22132.202470891] 2025-12-04T15:22:54.5929027Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:22:54.5932154Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:54.592979] 2025-12-04T15:23:13.2400325Z 2025-12-04T15:23:13.2401701Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_ec23ddb0902f120e_.log 2025-12-04T15:23:13.2405287Z Running 5 items in this shard: test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_abs_function, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_get_neighbour_values, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_no_neighbors, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_persistent_reduction, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_value_too_large 2025-12-04T15:23:13.2407982Z 2025-12-04T15:23:13.2408400Z Finished inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:23:13.239819][22150.849729028], took 0.31min 2025-12-04T15:23:13.2729144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.xml 2025-12-04T15:23:13.3537971Z Running inductor/test_inplace_padding 1/1 ... [2025-12-04 15:23:13.353513][22150.963420266] 2025-12-04T15:23:13.3538581Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:23:13.3542126Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:13.353960] 2025-12-04T15:23:35.5519092Z 2025-12-04T15:23:35.5520170Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_79ffe73bfaa271da_.log 2025-12-04T15:23:35.5524979Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input 2025-12-04T15:23:35.5528862Z 2025-12-04T15:23:35.5529236Z Finished inductor/test_inplace_padding 1/1 ... [2025-12-04 15:23:35.551684][22173.161593597], took 0.37min 2025-12-04T15:23:35.5841645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.xml 2025-12-04T15:23:35.6625837Z Running inductor/test_cudacodecache 1/1 ... [2025-12-04 15:23:35.662280][22173.2721883] 2025-12-04T15:23:35.6626408Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:23:35.6629284Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudacodecache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:35.662683] 2025-12-04T15:23:47.7451512Z 2025-12-04T15:23:47.7452563Z inductor/test_cudacodecache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudacodecache_1.1_0486dc99f2c38224_.log 2025-12-04T15:23:47.7454616Z Running 3 items in this shard: test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_async_compile, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_compilation_error, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load 2025-12-04T15:23:47.7455940Z 2025-12-04T15:23:47.7456321Z Finished inductor/test_cudacodecache 1/1 ... [2025-12-04 15:23:47.744930][22185.354839995], took 0.20min 2025-12-04T15:23:47.7780244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.xml 2025-12-04T15:23:47.8932973Z Running inductor/test_minifier_utils 1/1 ... [2025-12-04 15:23:47.892900][22185.502807389] 2025-12-04T15:23:47.8933624Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:23:47.8936339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:47.893354] 2025-12-04T15:23:55.9195003Z 2025-12-04T15:23:55.9196246Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_29e2300addd2b151_.log 2025-12-04T15:23:55.9198371Z Running 3 items in this shard: test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_convert_module_to_string, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_invalid_output, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_non_exportable 2025-12-04T15:23:55.9200024Z 2025-12-04T15:23:55.9200394Z Finished inductor/test_minifier_utils 1/1 ... [2025-12-04 15:23:55.919240][22193.529149315], took 0.13min 2025-12-04T15:23:55.9525454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.xml 2025-12-04T15:23:56.0513608Z Running inductor/test_debug_trace 1/1 ... [2025-12-04 15:23:56.051046][22193.660953429] 2025-12-04T15:23:56.0514195Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:23:56.0517053Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:56.051471] 2025-12-04T15:24:19.6010245Z 2025-12-04T15:24:19.6011259Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_9dbcd0e5470fca07_.log 2025-12-04T15:24:19.6013244Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace 2025-12-04T15:24:19.6014523Z 2025-12-04T15:24:19.6014869Z Finished inductor/test_debug_trace 1/1 ... [2025-12-04 15:24:19.600795][22217.210704721], took 0.39min 2025-12-04T15:24:19.6336513Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.xml 2025-12-04T15:24:19.7202285Z Running inductor/test_foreach 1/1 ... [2025-12-04 15:24:19.719805][22217.329712191] 2025-12-04T15:24:19.7202924Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:24:19.7205489Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:24:19.720273] 2025-12-04T15:33:07.4387486Z 2025-12-04T15:33:07.4391423Z inductor/test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_foreach_1.1_72dc555a9d39f8a0_.log 2025-12-04T15:33:07.4632155Z Running 536 items in this shard: test/inductor/test_foreach.py::ForeachTests::test_2d_block_mixed_sizes_with_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_block_no_mixed_sizes_no_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_aliasing, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcdiv, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcmul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_cpp_wrapper_xpu, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_python_wrapper, test/inductor/test_foreach.py::ForeachTests::test_foreach_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_foreach_cpp_wrapper_xpu, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_input_mutation, test/inductor/test_foreach.py::ForeachTests::test_fuse_concat, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_multi_device, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_zero_elems 2025-12-04T15:33:07.4868714Z 2025-12-04T15:33:07.4869085Z Finished inductor/test_foreach 1/1 ... [2025-12-04 15:33:07.439400][22745.049306979], took 8.80min 2025-12-04T15:33:07.4870358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.xml 2025-12-04T15:33:08.8286564Z Uploading artifacts took 1.24 seconds 2025-12-04T15:33:08.8290675Z Running inductor/test_cache 1/1 ... [2025-12-04 15:33:08.828873][22746.438779917] 2025-12-04T15:33:08.8291342Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:33:08.8296034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:33:08.829327] 2025-12-04T15:34:02.6906755Z 2025-12-04T15:34:02.6907944Z inductor/test_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cache_1.1_b15a3258d122eb10_.log 2025-12-04T15:34:02.7292636Z Running 725 items in this shard: test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_dict, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_fpath_from_key_un_pickle_able_on_disk_cache_type0, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_fpath_from_key_un_pickle_able_on_disk_cache_type1, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_version_bump_on_disk_cache_type0, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_version_bump_on_disk_cache_type1 2025-12-04T15:34:02.7665781Z 2025-12-04T15:34:02.7666347Z Finished inductor/test_cache 1/1 ... [2025-12-04 15:34:02.691898][22800.301804729], took 0.90min 2025-12-04T15:34:02.7667570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.xml 2025-12-04T15:34:02.8240481Z Running dynamo/test_config 1/1 ... [2025-12-04 15:34:02.823730][22800.433637411] 2025-12-04T15:34:02.8241044Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:02.8244075Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:02.824166] 2025-12-04T15:34:11.1007783Z 2025-12-04T15:34:11.1008769Z dynamo/test_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_config_1.1_34b955669d56d548_.log 2025-12-04T15:34:11.1011262Z Running 5 items in this shard: test/dynamo/test_config.py::ConfigTests::test_automatic_dynamic, test/dynamo/test_config.py::ConfigTests::test_config_compile_ignored, test/dynamo/test_config.py::ConfigTests::test_config_hash, test/dynamo/test_config.py::ConfigTests::test_no_assume_static_by_default, test/dynamo/test_config.py::ConfigTests::test_no_automatic_dynamic 2025-12-04T15:34:11.1013096Z 2025-12-04T15:34:11.1013415Z Finished dynamo/test_config 1/1 ... [2025-12-04 15:34:11.100527][22808.710436756], took 0.14min 2025-12-04T15:34:11.1351655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.xml 2025-12-04T15:34:11.2165097Z Running dynamo/test_metrics_context 1/1 ... [2025-12-04 15:34:11.216227][22808.82611982] 2025-12-04T15:34:11.2165685Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:11.2169271Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_metrics_context.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:11.216669] 2025-12-04T15:34:16.7894751Z 2025-12-04T15:34:16.7895834Z dynamo/test_metrics_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_metrics_context_1.1_5c0162a494019d34_.log 2025-12-04T15:34:16.7900634Z Running 9 items in this shard: test/dynamo/test_metrics_context.py::TestMetricsContext::test_add_to_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_context_exists, test/dynamo/test_metrics_context.py::TestMetricsContext::test_nested_context, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_disallow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_key_value, test/dynamo/test_metrics_context.py::TestMetricsContext::test_top_n, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_allow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_disallow_overwrite 2025-12-04T15:34:16.7904548Z 2025-12-04T15:34:16.7904921Z Finished dynamo/test_metrics_context 1/1 ... [2025-12-04 15:34:16.789239][22814.39914817], took 0.09min 2025-12-04T15:34:16.8240251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.xml 2025-12-04T15:34:16.8590747Z Running export/test_package 1/1 ... [2025-12-04 15:34:16.858782][22814.468688592] 2025-12-04T15:34:16.8591317Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:16.8594671Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:16.859211] 2025-12-04T15:34:22.7326575Z 2025-12-04T15:34:22.7327817Z export/test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_package_1.1_c7910f2956ab0b71_.log 2025-12-04T15:34:22.7329809Z Running 4 items in this shard: test/export/test_package.py::TestPackage::test_basic, test/export/test_package.py::TestPackage::test_error, test/export/test_package.py::TestPackage::test_more_than_once, test/export/test_package.py::TestPackage::test_overloads 2025-12-04T15:34:22.7331124Z 2025-12-04T15:34:22.7331475Z Finished export/test_package 1/1 ... [2025-12-04 15:34:22.732445][22820.342355037], took 0.10min 2025-12-04T15:34:22.7672833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.xml 2025-12-04T15:34:22.8024808Z Running dynamo/test_nops 1/1 ... [2025-12-04 15:34:22.802192][22820.412101026] 2025-12-04T15:34:22.8025362Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:22.8028600Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:22.802610] 2025-12-04T15:34:28.8762772Z 2025-12-04T15:34:28.8763724Z dynamo/test_nops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nops_1.1_eec8955a89c0749e_.log 2025-12-04T15:34:28.8765452Z Running 4 items in this shard: test/dynamo/test_nops.py::NopTests::test1, test/dynamo/test_nops.py::NopTests::test2, test/dynamo/test_nops.py::NopTests::test3, test/dynamo/test_nops.py::NopTests::test_extended_args 2025-12-04T15:34:28.8766525Z 2025-12-04T15:34:28.8766829Z Finished dynamo/test_nops 1/1 ... [2025-12-04 15:34:28.876065][22826.485973511], took 0.10min 2025-12-04T15:34:28.9108746Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.xml 2025-12-04T15:34:28.9957663Z Running inductor/test_graph_transform_observer 1/1 ... [2025-12-04 15:34:28.995416][22826.605324831] 2025-12-04T15:34:28.9958347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:28.9961145Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_graph_transform_observer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:28.995855] 2025-12-04T15:34:39.1755987Z 2025-12-04T15:34:39.1757449Z inductor/test_graph_transform_observer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_graph_transform_observer_1.1_2166094392cbcf10_.log 2025-12-04T15:34:39.1759098Z Running 1 items in this shard: test/inductor/test_graph_transform_observer.py::TestGraphTransformObserver::test_sdpa_rewriter 2025-12-04T15:34:39.1759754Z 2025-12-04T15:34:39.1760200Z Finished inductor/test_graph_transform_observer 1/1 ... [2025-12-04 15:34:39.175384][22836.785293487], took 0.17min 2025-12-04T15:34:39.2104318Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.xml 2025-12-04T15:34:39.2819659Z Running export/test_db 1/1 ... [2025-12-04 15:34:39.281670][22836.891578869] 2025-12-04T15:34:39.2820217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:39.2823274Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_db.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:39.282097] 2025-12-04T15:34:50.5124082Z 2025-12-04T15:34:50.5124995Z export/test_db 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_db_1.1_e88cbc04d8a44796_.log 2025-12-04T15:34:50.5141126Z Running 36 items in this shard: test/export/test_db.py::ExampleTests::test_exportdb_not_supported_case_dynamic_shape_round, test/export/test_db.py::ExampleTests::test_exportdb_not_supported_case_unsupported_operator, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_assume_constant_result, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_autograd_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_class_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_class_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_nested_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_nonlocal_variables, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_closed_over_variable, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_operands, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_predicate, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_constrain_as_size_example, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_constrain_as_value_example, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_decorator, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dictionary, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_assert, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_constructor, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_if_guard, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_map, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_slicing, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_view, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_fn_with_kwargs, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_list_contains, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_list_unpack, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_model_attr_mutation, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_nested_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_null_context_manager, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_optional_input, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_pytree_flatten, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_scalar_output, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_specialized_attribute, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_static_for_loop, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_static_if, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_tensor_setattr, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_type_reflection_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_user_input_mutation 2025-12-04T15:34:50.5156767Z 2025-12-04T15:34:50.5157067Z Finished export/test_db 1/1 ... [2025-12-04 15:34:50.512240][22848.12214896], took 0.19min 2025-12-04T15:34:50.5474656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.xml 2025-12-04T15:34:50.6351264Z Running dynamo/test_export_mutations 1/1 ... [2025-12-04 15:34:50.634812][22848.244719678] 2025-12-04T15:34:50.6351892Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:50.6354749Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export_mutations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:50.635238] 2025-12-04T15:34:58.5112200Z 2025-12-04T15:34:58.5113601Z dynamo/test_export_mutations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_mutations_1.1_68937c62c4814f0f_.log 2025-12-04T15:34:58.5117377Z Running 5 items in this shard: test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_1, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_2, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_3, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_4, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_positive_1 2025-12-04T15:34:58.5120400Z 2025-12-04T15:34:58.5120785Z Finished dynamo/test_export_mutations 1/1 ... [2025-12-04 15:34:58.510979][22856.12088544], took 0.13min 2025-12-04T15:34:58.5469743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.xml 2025-12-04T15:34:58.6291565Z Running inductor/test_config 1/1 ... [2025-12-04 15:34:58.628822][22856.23872865] 2025-12-04T15:34:58.6292133Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:34:58.6295437Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:58.629282] 2025-12-04T15:35:17.6726751Z 2025-12-04T15:35:17.6727912Z inductor/test_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_config_1.1_8da77f3c96eb0a54_.log 2025-12-04T15:35:17.6734194Z Running 14 items in this shard: test/inductor/test_config.py::TestInductorConfig::test_api_options, test/inductor/test_config.py::TestInductorConfig::test_codegen_skips_custom_passes, test/inductor/test_config.py::TestInductorConfig::test_compile_api, test/inductor/test_config.py::TestInductorConfig::test_compile_api_passes_config, test/inductor/test_config.py::TestInductorConfig::test_get_compiler_config, test/inductor/test_config.py::TestInductorConfig::test_hasattr, test/inductor/test_config.py::TestInductorConfig::test_invalid_backend, test/inductor/test_config.py::TestInductorConfig::test_invalid_names, test/inductor/test_config.py::TestInductorConfig::test_non_inductor_backend, test/inductor/test_config.py::TestInductorConfig::test_options_do_something, test/inductor/test_config.py::TestInductorConfig::test_patch, test/inductor/test_config.py::TestInductorConfig::test_save_load, test/inductor/test_config.py::TestInductorConfig::test_select_decomp_table_fallback_embedding_bag_byte_unpack, test/inductor/test_config.py::TestInductorConfig::test_set 2025-12-04T15:35:17.6740005Z 2025-12-04T15:35:17.6740339Z Finished inductor/test_config 1/1 ... [2025-12-04 15:35:17.672437][22875.282345389], took 0.32min 2025-12-04T15:35:17.7088850Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.xml 2025-12-04T15:35:17.7941111Z Running inductor/test_dependencies 1/1 ... [2025-12-04 15:35:17.793793][22875.403699783] 2025-12-04T15:35:17.7941734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:17.7944887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_dependencies.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:17.794242] 2025-12-04T15:35:28.1240901Z 2025-12-04T15:35:28.1242009Z inductor/test_dependencies 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_dependencies_1.1_a229a828add2b21e_.log 2025-12-04T15:35:28.1245454Z Running 5 items in this shard: test/inductor/test_dependencies.py::TestDependencies::test_bucketize_dependencies_no_sorter, test/inductor/test_dependencies.py::TestDependencies::test_bucketize_dependencies_sorter, test/inductor/test_dependencies.py::TestDependencies::test_get_offset, test/inductor/test_dependencies.py::TestDependencies::test_normalize_with_stride_order_equal, test/inductor/test_dependencies.py::TestDependencies::test_normalize_with_stride_order_unequal 2025-12-04T15:35:28.1248008Z 2025-12-04T15:35:28.1248379Z Finished inductor/test_dependencies 1/1 ... [2025-12-04 15:35:28.123859][22885.733768269], took 0.17min 2025-12-04T15:35:28.1594430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.xml 2025-12-04T15:35:28.2466550Z Running inductor/test_fuzzer 1/1 ... [2025-12-04 15:35:28.246328][22885.856235643] 2025-12-04T15:35:28.2467146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:28.2470172Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fuzzer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:28.246759] 2025-12-04T15:35:49.8441110Z 2025-12-04T15:35:49.8442093Z inductor/test_fuzzer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fuzzer_1.1_7ef41a4207e7fec8_.log 2025-12-04T15:35:49.8447441Z Running 11 items in this shard: test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_bisector_boolean, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_bisector_exception, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_bisect, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_cpu, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_gpu, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_n_tuple, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_fuzzer_inductor_calling_compile, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_fuzzer_running_test, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_sampling_method_random, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_sampling_method_toggle 2025-12-04T15:35:49.8452014Z 2025-12-04T15:35:49.8452365Z Finished inductor/test_fuzzer 1/1 ... [2025-12-04 15:35:49.843861][22907.453769513], took 0.36min 2025-12-04T15:35:49.8799911Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.xml 2025-12-04T15:35:49.9873148Z Running dynamo/test_global 1/1 ... [2025-12-04 15:35:49.986944][22907.59685115] 2025-12-04T15:35:49.9873776Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:35:49.9876486Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_global.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:49.987372] 2025-12-04T15:36:06.0268960Z 2025-12-04T15:36:06.0270122Z dynamo/test_global 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_global_1.1_be67321ce36fdfe2_.log 2025-12-04T15:36:06.0275521Z Running 12 items in this shard: test/dynamo/test_global.py::TestGlobals::test_store_global_1, test/dynamo/test_global.py::TestGlobals::test_store_global_2, test/dynamo/test_global.py::TestGlobals::test_store_global_cross_file, test/dynamo/test_global.py::TestGlobals::test_store_global_crossfile_inline, test/dynamo/test_global.py::TestGlobals::test_store_global_dict, test/dynamo/test_global.py::TestGlobals::test_store_global_dict_2, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_1, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_2, test/dynamo/test_global.py::TestGlobals::test_store_global_list, test/dynamo/test_global.py::TestGlobals::test_store_global_list_2, test/dynamo/test_global.py::TestGlobals::test_store_global_new, test/dynamo/test_global.py::TestGlobals::test_store_global_object 2025-12-04T15:36:06.0279614Z 2025-12-04T15:36:06.0279932Z Finished dynamo/test_global 1/1 ... [2025-12-04 15:36:06.026671][22923.63658009], took 0.27min 2025-12-04T15:36:06.0625815Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.xml 2025-12-04T15:36:06.1382424Z Running inductor/test_control_flow 1/4 ... [2025-12-04 15:36:06.137933][22923.7478397] 2025-12-04T15:36:06.1383011Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:36:06.1386588Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:36:06.138390] 2025-12-04T15:51:12.2748291Z 2025-12-04T15:51:12.2749419Z inductor/test_control_flow 1/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_1.4_b6ec092c04daf6c8_.log 2025-12-04T15:51:12.2983348Z Running 190 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_reintepret_view_inputs_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_True 2025-12-04T15:51:12.3189380Z 2025-12-04T15:51:12.3197732Z Finished inductor/test_control_flow 1/4 ... [2025-12-04 15:51:12.319558][23829.929457464], took 15.10min 2025-12-04T15:51:12.3560227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.xml 2025-12-04T15:51:12.4354980Z Running dynamo/test_cudagraphs 1/1 ... [2025-12-04 15:51:12.435170][23830.045077955] 2025-12-04T15:51:12.4355759Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:51:12.4358392Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:12.435574] 2025-12-04T15:51:21.8133662Z 2025-12-04T15:51:21.8134698Z dynamo/test_cudagraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_1.1_f31f593cd6865772_.log 2025-12-04T15:51:21.8138273Z Running 8 items in this shard: test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutated_metadata 2025-12-04T15:51:21.8141180Z 2025-12-04T15:51:21.8141546Z Finished dynamo/test_cudagraphs 1/1 ... [2025-12-04 15:51:21.813160][23839.423070409], took 0.16min 2025-12-04T15:51:21.8488547Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.xml 2025-12-04T15:51:21.9271763Z Running inductor/test_alignment 1/1 ... [2025-12-04 15:51:21.926893][23839.536800468] 2025-12-04T15:51:21.9272351Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:51:21.9275632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_alignment.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:21.927318] 2025-12-04T15:51:42.9719475Z 2025-12-04T15:51:42.9720539Z inductor/test_alignment 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_alignment_1.1_c850ab1c90ef7284_.log 2025-12-04T15:51:43.0089909Z Running 12 items in this shard: test/inductor/test_alignment.py::GPUTests::test_Q4_K_dequantization_cuda, test/inductor/test_alignment.py::GPUTests::test_alignment_without_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_incorrect_meta_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1024_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1048576_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_128_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_cuda, test/inductor/test_alignment.py::GPUTests::test_view_dtype_slice_cuda 2025-12-04T15:51:43.0094549Z 2025-12-04T15:51:43.0094913Z Finished inductor/test_alignment 1/1 ... [2025-12-04 15:51:42.971741][23860.581648143], took 0.35min 2025-12-04T15:51:43.0096198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.xml 2025-12-04T15:51:43.0920557Z Running dynamo/test_profiler 1/1 ... [2025-12-04 15:51:43.091707][23860.701615104] 2025-12-04T15:51:43.0921156Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:51:43.0924265Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:43.092149] 2025-12-04T15:52:01.6344248Z 2025-12-04T15:52:01.6345563Z dynamo/test_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_profiler_1.1_bdf79e2257b8f437_.log 2025-12-04T15:52:01.6351029Z Running 10 items in this shard: test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_backend_compile, test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_isolated, test/dynamo/test_profiler.py::DynamoProfilerTests::test_execution_trace_dynamic_shapes, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_list_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_runtime, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup_profiler_step, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_dynamo_compiled_region, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_enabled_export 2025-12-04T15:52:01.6355814Z 2025-12-04T15:52:01.6356143Z Finished dynamo/test_profiler 1/1 ... [2025-12-04 15:52:01.634230][23879.24413783], took 0.31min 2025-12-04T15:52:01.6705938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.xml 2025-12-04T15:52:01.7499993Z Running dynamo/test_guard_serialization 1/1 ... [2025-12-04 15:52:01.749664][23879.35957141] 2025-12-04T15:52:01.7500606Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:52:01.7504182Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:01.750122] 2025-12-04T15:52:25.9623654Z 2025-12-04T15:52:25.9625277Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_ca95c718e2b65acd_.log 2025-12-04T15:52:25.9654740Z Running 56 items in this shard: test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bool_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_patched_forward, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_empty, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_builtin_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_c10d_work, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_class_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_var_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_constant_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_ddp_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_default_device, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_deterministic_algorithms, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_contains, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_version, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dispatch_key_set_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dual_level, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_duplicate_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_empty_nn_module_hooks_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_equals_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_fsdp_training_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_locals, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_with_wrong_fqn, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_functorch_stack_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_global_state_guard_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode_loading, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_guard_on_key_order_with_cache, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_hasattr_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match_with_config, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_mapping_keys_check, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_nn_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_none_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_not_present_in_generic_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_range_iterator_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sdp_backend_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sequence_length, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_shape_env, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_skipped_objects, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_subclass_metadata_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tuple_iterator_len, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_type_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_sharded_tensor, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_submodule, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_process_group, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_stream, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_weakref, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_weakref_alive, test/dynamo/test_guard_serialization.py::TestGuardSerializationFSDP::test_guard_serialization_fsdp_module 2025-12-04T15:52:25.9680835Z 2025-12-04T15:52:25.9681237Z Finished dynamo/test_guard_serialization 1/1 ... [2025-12-04 15:52:25.962215][23903.572122893], took 0.40min 2025-12-04T15:52:25.9998800Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.xml 2025-12-04T15:52:26.0860948Z Running dynamo/test_dicts 1/1 ... [2025-12-04 15:52:26.085782][23903.695689497] 2025-12-04T15:52:26.0861493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:52:26.0864668Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dicts.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:26.086227] 2025-12-04T15:52:52.8890559Z 2025-12-04T15:52:52.8891751Z dynamo/test_dicts 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dicts_1.1_9286d343eb07609f_.log 2025-12-04T15:52:52.8939148Z Running 140 items in this shard: test/dynamo/test_dicts.py::DictTests::test_builtin_ior_, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_diff_keys, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_invalid_types, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_same_keys, test/dynamo/test_dicts.py::DictTests::test_construct_user_dict_and_return, test/dynamo/test_dicts.py::DictTests::test_contains_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_contains_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_custom_iter_dict, test/dynamo/test_dicts.py::DictTests::test_custom_keys_iter_dict, test/dynamo/test_dicts.py::DictTests::test_dict_construct_from_mapping_like, test/dynamo/test_dicts.py::DictTests::test_dict_construction_from_mapping_proxy, test/dynamo/test_dicts.py::DictTests::test_dict_contains, test/dynamo/test_dicts.py::DictTests::test_dict_contains_enum, test/dynamo/test_dicts.py::DictTests::test_dict_copy_alias, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order2, test/dynamo/test_dicts.py::DictTests::test_dict_iter, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_and_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_or_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_sub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_xor, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_iand, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ior, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_isub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ixor, test/dynamo/test_dicts.py::DictTests::test_dict_list_values, test/dynamo/test_dicts.py::DictTests::test_dict_mutation_side_effect, test/dynamo/test_dicts.py::DictTests::test_dict_namedtuple, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_modules, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_tensors, test/dynamo/test_dicts.py::DictTests::test_dict_reconstruct_keeps_original_order, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_contains, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_get_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_initialization_in_graph, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation_return, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_with_non_dict_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_readonly, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_setitem, test/dynamo/test_dicts.py::DictTests::test_dict_tag_guard, test/dynamo/test_dicts.py::DictTests::test_empty_dict_recompilation, test/dynamo/test_dicts.py::DictTests::test_fn_id, test/dynamo/test_dicts.py::DictTests::test_items_type, test/dynamo/test_dicts.py::DictTests::test_iter_default_dict, test/dynamo/test_dicts.py::DictTests::test_lazy_key_guarding, test/dynamo/test_dicts.py::DictTests::test_lazy_key_non_const_guarding, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_ban_muation_on_dict_realization, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_local_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_local, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_nonlocal, test/dynamo/test_dicts.py::DictTests::test_move_to_end, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict_no_default_factory, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict_with_dict, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_subclass_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_overridden_get_item, test/dynamo/test_dicts.py::DictTests::test_udf_dict_reconstruction, test/dynamo/test_dicts.py::DictTests::test_update_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_update_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_weakref_dict, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_eq, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ior, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ne, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_or, test/dynamo/test_dicts.py::DictGuardTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictMethodsTests::test_clear, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictMethodsTests::test_copy, test/dynamo/test_dicts.py::DictMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::DictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::DictMethodsTests::test_get, test/dynamo/test_dicts.py::DictMethodsTests::test_items, test/dynamo/test_dicts.py::DictMethodsTests::test_keys, test/dynamo/test_dicts.py::DictMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::DictMethodsTests::test_pop, test/dynamo/test_dicts.py::DictMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictMethodsTests::test_type, test/dynamo/test_dicts.py::DictMethodsTests::test_update, test/dynamo/test_dicts.py::DictMethodsTests::test_values, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_clear, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_copy, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_get, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_items, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_keys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_pop, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_type, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_update, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_clear, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq_order, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_copy, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_get, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_items, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_keys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_move_to_end, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_pop, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem_kwarg, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_update, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictSubclassOverload::test_move_to_end 2025-12-04T15:52:52.8985551Z 2025-12-04T15:52:52.8985876Z Finished dynamo/test_dicts 1/1 ... [2025-12-04 15:52:52.889062][23930.49896874], took 0.45min 2025-12-04T15:52:52.9345890Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.xml 2025-12-04T15:52:53.0377114Z Running dynamo/test_optimizers 1/1 ... [2025-12-04 15:52:53.037413][23930.647318861] 2025-12-04T15:52:53.0377709Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:52:53.0381687Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_optimizers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:53.037846] 2025-12-04T15:53:02.0149657Z 2025-12-04T15:53:02.0150720Z dynamo/test_optimizers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_optimizers_1.1_6e8896f6f8ab34bf_.log 2025-12-04T15:53:02.0152673Z Running 3 items in this shard: test/dynamo/test_optimizers.py::End2EndTests::test_init_group, test/dynamo/test_optimizers.py::End2EndTests::test_optimizing_over_tensor_with_requires_grad, test/dynamo/test_optimizers.py::End2EndTests::test_state_dict 2025-12-04T15:53:02.0153981Z 2025-12-04T15:53:02.0154335Z Finished dynamo/test_optimizers 1/1 ... [2025-12-04 15:53:02.014740][23939.624650069], took 0.15min 2025-12-04T15:53:02.0515936Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.xml 2025-12-04T15:53:02.1544465Z Running export/test_torchbind 1/1 ... [2025-12-04 15:53:02.154182][23939.76408984] 2025-12-04T15:53:02.1545032Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:02.1548248Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_torchbind.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:02.154584] 2025-12-04T15:53:37.1213321Z 2025-12-04T15:53:37.1214370Z export/test_torchbind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_torchbind_1.1_2a7aef954986f1ed_.log 2025-12-04T15:53:37.1263915Z Running 90 items in this shard: test/export/test_torchbind.py::TestExportTorchbind::test_aot_export_tensor_queue_operators, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_as_custom_op_argument_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_as_custom_op_argument_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_list_out_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_list_out_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_tuple_out_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_tuple_out_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_unbacked_symint_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_unbacked_symint_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_deepcopy, test/export/test_torchbind.py::TestExportTorchbind::test_export_inplace_custom_op, test/export/test_torchbind.py::TestExportTorchbind::test_identifying_torchbind_ops, test/export/test_torchbind.py::TestExportTorchbind::test_input_as_custom_op_argument_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_input_as_custom_op_argument_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_input_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_input_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_schema_checking_script_object, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_fakify_internal_states_make_fx_tracing_mode_fake, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_fakify_internal_states_make_fx_tracing_mode_symbolic, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_make_fx_tracing_mode_fake, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_make_fx_tracing_mode_symbolic, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_operators_fallthrough_via_lib_impl, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_operators_fallthrough_via_py_impl, test/export/test_torchbind.py::TestExportTorchbind::test_method_schema, test/export/test_torchbind.py::TestExportTorchbind::test_non_strict_export_methods, test/export/test_torchbind.py::TestExportTorchbind::test_none_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_none_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_safe_to_trace_with_real, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_alias_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_alias_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_input_and_alias_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_input_and_alias_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_op_fallthrough_keys_respects_lib_impl, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_op_register_fallthrough, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_register_attr_at_runtime_get_restored, test/export/test_torchbind.py::TestExportTorchbind::test_unlift_custom_obj_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_unlift_custom_obj_pre_dispatch_True, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_missing_attr_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_missing_attr_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_setattr_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_setattr_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_graph_breaks, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_automatic_dynamic_shape, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_export_obj_torchbind_op_with_autocast_device_cpu, test/export/test_torchbind.py::TestCompileTorchbind::test_export_obj_torchbind_op_with_autocast_device_cuda, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_from_real_not_classmethod, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_no_from_real, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_no_torch_bind_class, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_valid 2025-12-04T15:53:37.1310693Z 2025-12-04T15:53:37.1311042Z Finished export/test_torchbind 1/1 ... [2025-12-04 15:53:37.122513][23974.73241564], took 0.58min 2025-12-04T15:53:37.1602507Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.xml 2025-12-04T15:53:38.5908084Z Uploading artifacts took 1.35 seconds 2025-12-04T15:53:38.5912140Z Running dynamo/test_python_dispatcher 1/1 ... [2025-12-04 15:53:38.591029][23976.200936279] 2025-12-04T15:53:38.5913037Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:38.5916932Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_dispatcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:38.591466] 2025-12-04T15:53:46.7173032Z 2025-12-04T15:53:46.7174077Z dynamo/test_python_dispatcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_dispatcher_1.1_d5e45034fa548233_.log 2025-12-04T15:53:46.7177634Z Running 6 items in this shard: test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key1, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key2, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key3, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key4, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key_set_guard, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_functorch_interpreter 2025-12-04T15:53:46.7180442Z 2025-12-04T15:53:46.7180829Z Finished dynamo/test_python_dispatcher 1/1 ... [2025-12-04 15:53:46.717068][23984.326977713], took 0.14min 2025-12-04T15:53:46.7545625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.xml 2025-12-04T15:53:46.8284532Z Running export/test_swap 1/1 ... [2025-12-04 15:53:46.828128][23984.438036132] 2025-12-04T15:53:46.8285109Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:46.8287884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:46.828549] 2025-12-04T15:53:57.1071579Z 2025-12-04T15:53:57.1073084Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_75b32b5d64f61c05_.log 2025-12-04T15:53:57.1080992Z Running 20 items in this shard: test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_args, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_nonstrict::test_custom_output, test/export/test_swap.py::TestSwap_nonstrict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_nonstrict::test_nested_leaf, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_with_unused_input, test/export/test_swap.py::TestSwap_strict::test_custom_input_args, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_strict::test_custom_output, test/export/test_swap.py::TestSwap_strict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_strict::test_nested_leaf, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_with_unused_input 2025-12-04T15:53:57.1088938Z 2025-12-04T15:53:57.1089248Z Finished export/test_swap 1/1 ... [2025-12-04 15:53:57.106959][23994.716868243], took 0.17min 2025-12-04T15:53:57.1436491Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.xml 2025-12-04T15:53:57.2320405Z Running export/test_unflatten 1/1 ... [2025-12-04 15:53:57.231723][23994.841631985] 2025-12-04T15:53:57.2320982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:53:57.2323636Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:57.232109] 2025-12-04T15:54:21.6313429Z 2025-12-04T15:54:21.6316182Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_e240ad71aaf7be43_.log 2025-12-04T15:54:21.6328507Z Running 29 items in this shard: test/export/test_unflatten.py::TestUnflatten::test_assert_tensor_metadata_stack, test/export/test_unflatten.py::TestUnflatten::test_attr_as_submod_input, test/export/test_unflatten.py::TestUnflatten::test_dedup_sym_size, test/export/test_unflatten.py::TestUnflatten::test_double_nested_submodule, test/export/test_unflatten.py::TestUnflatten::test_duplicate_placeholder, test/export/test_unflatten.py::TestUnflatten::test_fx_trace, test/export/test_unflatten.py::TestUnflatten::test_nested_leaf_non_strict, test/export/test_unflatten.py::TestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened, test/export/test_unflatten.py::TestUnflatten::test_simple_alias, test/export/test_unflatten.py::TestUnflatten::test_unflatten_buffer_mutation, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_obj, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_tensor, test/export/test_unflatten.py::TestUnflatten::test_unflatten_container_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_eager, test/export/test_unflatten.py::TestUnflatten::test_unflatten_empty_branch, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested_access, test/export/test_unflatten.py::TestUnflatten::test_unflatten_none, test/export/test_unflatten.py::TestUnflatten::test_unflatten_param_list_dict, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_signature, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_with_unused_input, test/export/test_unflatten.py::TestUnflatten::test_unflatten_requires_grad_param, test/export/test_unflatten.py::TestUnflatten::test_unflatten_root_module_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_shared_submodule, test/export/test_unflatten.py::TestUnflatten::test_unflatten_skipped_call_module, test/export/test_unflatten.py::TestUnflatten::test_unflatten_submodule_ordering, test/export/test_unflatten.py::TestUnflatten::test_unflatten_with_inplace_compile, test/export/test_unflatten.py::TestUnflatten::test_unflatten_wrong_input, test/export/test_unflatten.py::TestUnflatten::test_unflattened_module_nodes_has_meta_val 2025-12-04T15:54:21.6340093Z 2025-12-04T15:54:21.6340435Z Finished export/test_unflatten 1/1 ... [2025-12-04 15:54:21.631104][24019.24101308], took 0.41min 2025-12-04T15:54:21.6685290Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.xml 2025-12-04T15:54:21.7344606Z Running dynamo/test_verify_correctness 1/1 ... [2025-12-04 15:54:21.734181][24019.344090123] 2025-12-04T15:54:21.7345248Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:54:21.7348233Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:21.734602] 2025-12-04T15:54:29.7104628Z 2025-12-04T15:54:29.7105753Z dynamo/test_verify_correctness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_verify_correctness_1.1_c32bdac20cc2dbcb_.log 2025-12-04T15:54:29.7108663Z Running 4 items in this shard: test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_example_inputs, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_false, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_true, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_torchscript 2025-12-04T15:54:29.7110605Z 2025-12-04T15:54:29.7111149Z Finished dynamo/test_verify_correctness 1/1 ... [2025-12-04 15:54:29.710269][24027.320176707], took 0.13min 2025-12-04T15:54:29.7477087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.xml 2025-12-04T15:54:29.8251946Z Running inductor/test_fxir_backend 1/1 ... [2025-12-04 15:54:29.824848][24027.434755369] 2025-12-04T15:54:29.8252884Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:54:29.8257231Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fxir_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:29.825338] 2025-12-04T15:55:34.3417254Z 2025-12-04T15:55:34.3418285Z inductor/test_fxir_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fxir_backend_1.1_615cfb6d9761ce74_.log 2025-12-04T15:55:34.3447994Z Running 73 items in this shard: test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_backward, test/inductor/test_fxir_backend.py::FxirTestCase::test_basic, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_inputs, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_reinterpret_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_to_alloc, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_views, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_no_operands_pred_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_no_operands_pred_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_subgraph_pred_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_subgraph_pred_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_cpp_raises, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_compiler, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_triton, test/inductor/test_fxir_backend.py::FxirTestCase::test_debug, test/inductor/test_fxir_backend.py::FxirTestCase::test_device_type, test/inductor/test_fxir_backend.py::FxirTestCase::test_duplicate_input, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_launch_grid_calc, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_and_strides, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_precomputed_size, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape0, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape1, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape2, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1_5, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern_multi_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_fallback, test/inductor/test_fxir_backend.py::FxirTestCase::test_fallback_tuple_constant_arg, test/inductor/test_fxir_backend.py::FxirTestCase::test_free, test/inductor/test_fxir_backend.py::FxirTestCase::test_index_put_fallback, test/inductor/test_fxir_backend.py::FxirTestCase::test_multiple_kernels, test/inductor/test_fxir_backend.py::FxirTestCase::test_output_slice_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_reshape_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_scatter_fallback_scalar_src, test/inductor/test_fxir_backend.py::FxirTestCase::test_scatter_reduce_fallback, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_const, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_dynamic, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_linear, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_dynamic_shape_pred_scalar_closure_length_4, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_dynamic_shape_pred_scalar_closure_length_8, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_multi_inputs_and_outputs_pred_False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_multi_inputs_and_outputs_pred_True, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_const_folded_subgraph, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_backend, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_triton_autotune_dynamic, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dims_dynamic_outer_static_padded_inner, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_input_expr_expr0, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_input_expr_expr1, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_scalar_output, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__1_5, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__2, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__1_5, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__2, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_mismatched_branch_dynamic_pred_False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_mismatched_branch_dynamic_pred_True, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_reshape_dynamic_ph, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_reshape_dynamic_tmd, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_launch_grid_dynamic_padding, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_no_distribute_mul_floordiv, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_no_rewrite_div, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rational_multi_pows, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_mul_pow, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_mul_rational, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_nested, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_rational_const, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_variable_exp 2025-12-04T15:55:34.3476831Z 2025-12-04T15:55:34.3477180Z Finished inductor/test_fxir_backend 1/1 ... [2025-12-04 15:55:34.341631][24091.951536658], took 1.08min 2025-12-04T15:55:34.3788241Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.xml 2025-12-04T15:55:34.4595613Z Running dynamo/test_structured_trace 1/1 ... [2025-12-04 15:55:34.459251][24092.069159625] 2025-12-04T15:55:34.4596192Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:55:34.4599299Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_structured_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:34.459665] 2025-12-04T15:56:20.4921666Z 2025-12-04T15:56:20.4922758Z dynamo/test_structured_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_structured_trace_1.1_e2032e57f1fbb9a7_.log 2025-12-04T15:56:20.4936309Z Running 29 items in this shard: test/dynamo/test_structured_trace.py::StructuredTraceTest::test_chromium_event, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_collective_schedule_empty, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_collective_schedule_real, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compile_id_serialization_deserialization, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_attribution, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_id, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_cudagraphs, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_dump_file, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_dynamo_error, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_example_fn, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_example_training_fn, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_breaks, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_execution_order, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_sizes_dynamic, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_guards_recompiles, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_inductor_error, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_make_fx_fail_partial, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompile_user_contexts, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompile_user_contexts_iteration, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompiles, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_runtime_estimates_mixed, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_runtime_estimates_simple, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_schedule, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging_dynamic_shapes, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging_multiple_ops 2025-12-04T15:56:20.4949004Z 2025-12-04T15:56:20.4949372Z Finished dynamo/test_structured_trace 1/1 ... [2025-12-04 15:56:20.492030][24138.101936537], took 0.77min 2025-12-04T15:56:20.5289231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.xml 2025-12-04T15:56:20.6142067Z Running dynamo/test_torchrec 1/1 ... [2025-12-04 15:56:20.613965][24138.223875256] 2025-12-04T15:56:20.6142645Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:20.6145364Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_torchrec.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:20.614316] 2025-12-04T15:56:25.4530450Z 2025-12-04T15:56:25.4531400Z dynamo/test_torchrec 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_torchrec_1.1_ef7e4418db36eb14_.log 2025-12-04T15:56:25.4532551Z Running 0 items in this shard: 2025-12-04T15:56:25.4532768Z 2025-12-04T15:56:25.4533119Z Finished dynamo/test_torchrec 1/1 ... [2025-12-04 15:56:25.452872][24143.062780842], took 0.08min 2025-12-04T15:56:25.4894869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.xml 2025-12-04T15:56:25.5187244Z Running test_model_exports_to_core_aten 1/1 ... [2025-12-04 15:56:25.518520][24143.128429544] 2025-12-04T15:56:25.5187831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:25.5190885Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_model_exports_to_core_aten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:25.518861] 2025-12-04T15:56:30.8404513Z 2025-12-04T15:56:30.8405665Z test_model_exports_to_core_aten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_model_exports_to_core_aten_1.1_1858ccc543938d86_.log 2025-12-04T15:56:30.8407033Z Running 1 items in this shard: test/test_model_exports_to_core_aten.py::TestQuantizePT2EModels::test_vit_aten_export 2025-12-04T15:56:30.8407642Z 2025-12-04T15:56:30.8408019Z Finished test_model_exports_to_core_aten 1/1 ... [2025-12-04 15:56:30.840236][24148.450144237], took 0.09min 2025-12-04T15:56:30.8775567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.xml 2025-12-04T15:56:30.9140070Z Running dynamo/test_precompile_context 1/1 ... [2025-12-04 15:56:30.913816][24148.523726995] 2025-12-04T15:56:30.9140647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:30.9143833Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_precompile_context.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:30.914170] 2025-12-04T15:56:48.4528580Z 2025-12-04T15:56:48.4529745Z dynamo/test_precompile_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_precompile_context_1.1_a5d2ca6b4ab870b9_.log 2025-12-04T15:56:48.4532094Z Running 3 items in this shard: test/dynamo/test_precompile_context.py::PrecompileContextTests::test_basic, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_editable, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_serialize_by_key 2025-12-04T15:56:48.4533653Z 2025-12-04T15:56:48.4534038Z Finished dynamo/test_precompile_context 1/1 ... [2025-12-04 15:56:48.452669][24166.062578006], took 0.29min 2025-12-04T15:56:48.4902818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.xml 2025-12-04T15:56:48.5749676Z Running dynamo/test_trace_rules 1/1 ... [2025-12-04 15:56:48.574735][24166.184644581] 2025-12-04T15:56:48.5750236Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:48.5753475Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_trace_rules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:48.575112] 2025-12-04T15:56:57.0508977Z 2025-12-04T15:56:57.0509982Z dynamo/test_trace_rules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_trace_rules_1.1_6759ebf57891eeeb_.log 2025-12-04T15:56:57.0513888Z Running 7 items in this shard: test/dynamo/test_trace_rules.py::TraceRuleTests::test_almost_impossible_missing_name, test/dynamo/test_trace_rules.py::TraceRuleTests::test_force_inline_custom_function, test/dynamo/test_trace_rules.py::TraceRuleTests::test_force_inline_torch_function, test/dynamo/test_trace_rules.py::TraceRuleTests::test_no_special_handlers_for_torch_non_c_bindings, test/dynamo/test_trace_rules.py::TraceRuleTests::test_skipfiles_inlinelist, test/dynamo/test_trace_rules.py::TraceRuleTests::test_torch_name_rule_map_updated, test/dynamo/test_trace_rules.py::TestModuleSurviveSkipFiles::test_module_survive_skip_files 2025-12-04T15:56:57.0516937Z 2025-12-04T15:56:57.0517281Z Finished dynamo/test_trace_rules 1/1 ... [2025-12-04 15:56:57.050733][24174.660639587], took 0.14min 2025-12-04T15:56:57.0883588Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.xml 2025-12-04T15:56:57.1640941Z Running export/test_upgrader 1/1 ... [2025-12-04 15:56:57.163885][24174.773794717] 2025-12-04T15:56:57.1641467Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:56:57.1644786Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_upgrader.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:57.164255] 2025-12-04T15:57:02.3856006Z 2025-12-04T15:57:02.3857025Z export/test_upgrader 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_upgrader_1.1_ed15a90621ede266_.log 2025-12-04T15:57:02.3860411Z Running 6 items in this shard: test/export/test_upgrader.py::TestUpgrader::test_field_renaming_chain_from_v0_complete, test/export/test_upgrader.py::TestUpgrader::test_field_renaming_chain_from_v0_missing_field, test/export/test_upgrader.py::TestUpgrader::test_field_renaming_from_v1_partial_chain, test/export/test_upgrader.py::TestUpgrader::test_nn_module_stack_error_handling_invalid_type, test/export/test_upgrader.py::TestUpgrader::test_nn_module_stack_transformation_from_v0, test/export/test_upgrader.py::TestUpgrader::test_nodes_without_metadata_handled_gracefully 2025-12-04T15:57:02.3863139Z 2025-12-04T15:57:02.3863489Z Finished export/test_upgrader 1/1 ... [2025-12-04 15:57:02.385404][24179.99531236], took 0.09min 2025-12-04T15:57:02.4235190Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.xml 2025-12-04T15:57:02.4533057Z Running dynamo/test_hooks 1/1 ... [2025-12-04 15:57:02.453129][24180.063039746] 2025-12-04T15:57:02.4533762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:02.4537238Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_hooks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:02.453486] 2025-12-04T15:57:31.9089737Z 2025-12-04T15:57:31.9090646Z dynamo/test_hooks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_hooks_1.1_66426e5cf57243c0_.log 2025-12-04T15:57:31.9104356Z Running 34 items in this shard: test/dynamo/test_hooks.py::HooksTests::test_complex_state_mutation_in_intermediary_hooks_same_on_inductor, test/dynamo/test_hooks.py::HooksTests::test_complex_state_mutation_in_intermediary_hooks_same_on_inductor_with_graph_break, test/dynamo/test_hooks.py::HooksTests::test_functools_arg_vary, test/dynamo/test_hooks.py::HooksTests::test_global_module_forward_pre_hook, test/dynamo/test_hooks.py::HooksTests::test_hook_with_closure, test/dynamo/test_hooks.py::HooksTests::test_hook_with_nested_closure, test/dynamo/test_hooks.py::HooksTests::test_input_hooks_same, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks_same_on_aot_eager, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks_same_on_inductor, test/dynamo/test_hooks.py::HooksTests::test_intermediate_hook_with_closure_aot, test/dynamo/test_hooks.py::HooksTests::test_intermediate_hook_with_closure_eager, test/dynamo/test_hooks.py::HooksTests::test_nnmodule_hook_guards, test/dynamo/test_hooks.py::HooksTests::test_no_recompile_on_hook_identity_change, test/dynamo/test_hooks.py::HooksTests::test_no_recompile_on_same_hook, test/dynamo/test_hooks.py::HooksTests::test_post_acc_grad_hook, test/dynamo/test_hooks.py::HooksTests::test_recompile, test/dynamo/test_hooks.py::HooksTests::test_register_hook_partial_guarding, test/dynamo/test_hooks.py::HooksTests::test_removed_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_local_inner, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_global_hook, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_global_hooks_handles_in_list, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_break_handle_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_break_handle_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_multi_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_repeated_handle_not_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_repeated_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_multiple_hooks, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_multiple_hooks_handles_in_list, test/dynamo/test_hooks.py::HooksTests::test_wrap_top_frame_with_hooks 2025-12-04T15:57:31.9117152Z 2025-12-04T15:57:31.9117488Z Finished dynamo/test_hooks 1/1 ... [2025-12-04 15:57:31.908799][24209.518707425], took 0.49min 2025-12-04T15:57:31.9464405Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.xml 2025-12-04T15:57:32.0243188Z Running dynamo/test_generator 1/1 ... [2025-12-04 15:57:32.024084][24209.633993562] 2025-12-04T15:57:32.0243733Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:32.0247051Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_generator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:32.024460] 2025-12-04T15:57:42.3529843Z 2025-12-04T15:57:42.3531059Z dynamo/test_generator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_generator_1.1_f207b5be74916c07_.log 2025-12-04T15:57:42.3562556Z Running 78 items in this shard: test/dynamo/test_generator.py::GeneratorTests::test_cleanup_throw, test/dynamo/test_generator.py::GeneratorTests::test_deque_extendleft, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container0, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container1, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container2, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container3, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_generator, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_sub_generator, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains__, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains___side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_2, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_3, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_4, test/dynamo/test_generator.py::GeneratorTests::test_generator_simple, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_and_reconstruct_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_before_calling_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_while_reconstructing, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_outside_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_3, test/dynamo/test_generator.py::GeneratorTests::test_islice_chain, test/dynamo/test_generator.py::GeneratorTests::test_iter, test/dynamo/test_generator.py::GeneratorTests::test_list_extend, test/dynamo/test_generator.py::GeneratorTests::test_list_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_tensor_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_local_var_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_return_advanced_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_exhaust_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_return_tuple_generator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_zip_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_subgenerator, test/dynamo/test_generator.py::TestGeneratorSend::test_send, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_exception, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_return, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_GeneratorExit, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc0, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc1, test/dynamo/test_generator.py::TestGeneratorClose::test_close_handling_finally, test/dynamo/test_generator.py::TestGeneratorClose::test_close_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_side_effects, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorThrow::test_exception_context_with_yield, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_None_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_const_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_no_yield_after_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_not_catch, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_raise_difference_exc, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_try_except_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_with_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_without_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_yield_finally 2025-12-04T15:57:42.3593485Z 2025-12-04T15:57:42.3593831Z Finished dynamo/test_generator 1/1 ... [2025-12-04 15:57:42.352850][24219.962757614], took 0.17min 2025-12-04T15:57:42.3910383Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.xml 2025-12-04T15:57:42.4707059Z Running export/test_verifier 1/1 ... [2025-12-04 15:57:42.470501][24220.080410947] 2025-12-04T15:57:42.4707657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:42.4711446Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_verifier.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:42.470874] 2025-12-04T15:57:51.0466685Z 2025-12-04T15:57:51.0467683Z export/test_verifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_verifier_1.1_96a0b4295b5beb1c_.log 2025-12-04T15:57:51.0472217Z Running 10 items in this shard: test/export/test_verifier.py::TestVerifier::test_ep_verifier_basic, test/export/test_verifier.py::TestVerifier::test_ep_verifier_buffer_mutate, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_buffer, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_output, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_param, test/export/test_verifier.py::TestVerifier::test_verifier_basic, test/export/test_verifier.py::TestVerifier::test_verifier_call_module, test/export/test_verifier.py::TestVerifier::test_verifier_higher_order, test/export/test_verifier.py::TestVerifier::test_verifier_nested_invalid_module, test/export/test_verifier.py::TestVerifier::test_verifier_no_functional 2025-12-04T15:57:51.0475996Z 2025-12-04T15:57:51.0476321Z Finished export/test_verifier 1/1 ... [2025-12-04 15:57:51.046493][24228.656403766], took 0.14min 2025-12-04T15:57:51.0841990Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.xml 2025-12-04T15:57:51.1661602Z Running export/test_sparse 2/2 ... [2025-12-04 15:57:51.165927][24228.775837032] 2025-12-04T15:57:51.1662158Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T15:57:51.1665453Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_sparse.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:51.166297] 2025-12-04T16:03:05.8392931Z 2025-12-04T16:03:05.8394053Z export/test_sparse 2/2 was successful, full logs can be found in artifacts with path test/test-reports/export.test_sparse_2.2_dc3ae5c04c4515a4_.log 2025-12-04T16:03:05.8434304Z Running 97 items in this shard: test/export/test_sparse.py::TestSparseProp::test_activation_coo, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int64_SparseBSC 2025-12-04T16:03:05.8472971Z 2025-12-04T16:03:05.8473285Z Finished export/test_sparse 2/2 ... [2025-12-04 16:03:05.839176][24543.449082951], took 5.24min 2025-12-04T16:03:05.8776741Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.xml 2025-12-04T16:03:05.9560193Z Running functorch/test_ac 1/1 ... [2025-12-04 16:03:05.955803][24543.565712728] 2025-12-04T16:03:05.9561051Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:03:05.9564602Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:05.956187] 2025-12-04T16:03:43.6204816Z 2025-12-04T16:03:43.6205929Z functorch/test_ac 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_1.1_99b1ba004ab023a0_.log 2025-12-04T16:03:43.6210453Z Running 9 items in this shard: test/functorch/test_ac.py::MemoryBudgetTest::test_attention_vs_linear, test/functorch/test_ac.py::MemoryBudgetTest::test_custom_triton_kernel, test/functorch/test_ac.py::MemoryBudgetTest::test_manual_ac, test/functorch/test_ac.py::MemoryBudgetTest::test_matmul_even_chain, test/functorch/test_ac.py::MemoryBudgetTest::test_matmul_uneven_chain, test/functorch/test_ac.py::MemoryBudgetTest::test_prioritize_cheaper_matmul, test/functorch/test_ac.py::MemoryBudgetTest::test_prioritize_cheaper_matmul2, test/functorch/test_ac.py::MemoryBudgetTest::test_profile, test/functorch/test_ac.py::MemoryBudgetTest::test_rematerializes_cheap 2025-12-04T16:03:43.6213661Z 2025-12-04T16:03:43.6213980Z Finished functorch/test_ac 1/1 ... [2025-12-04 16:03:43.620288][24581.230197412], took 0.63min 2025-12-04T16:03:43.6588569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.xml 2025-12-04T16:03:43.7314940Z Running test_out_dtype_op 1/1 ... [2025-12-04 16:03:43.731238][24581.341148437] 2025-12-04T16:03:43.7315560Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:03:43.7318715Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_out_dtype_op.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:43.731592] 2025-12-04T16:03:51.6062213Z 2025-12-04T16:03:51.6063453Z test_out_dtype_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_out_dtype_op_1.1_3e48e335f34b8277_.log 2025-12-04T16:03:51.6068367Z Running 12 items in this shard: test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_dynamo, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_int_mm_default_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_make_fx, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mm_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mul_scalar_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_no_autograd, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_op_overload, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_op_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_wrong_output 2025-12-04T16:03:51.6072579Z 2025-12-04T16:03:51.6072872Z Finished test_out_dtype_op 1/1 ... [2025-12-04 16:03:51.606030][24589.215940105], took 0.13min 2025-12-04T16:03:51.6445798Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.xml 2025-12-04T16:03:51.7280841Z Running torch_np/test_ufuncs_basic 1/1 ... [2025-12-04 16:03:51.727836][24589.337745972] 2025-12-04T16:03:51.7281513Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:03:51.7284900Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ufuncs_basic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:51.728192] 2025-12-04T16:03:57.7506331Z 2025-12-04T16:03:57.7507817Z torch_np/test_ufuncs_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ufuncs_basic_1.1_5b79d2f51b6173f9_.log 2025-12-04T16:03:57.7701129Z Running 371 items in this shard: test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype_and_out 2025-12-04T16:03:57.7889732Z 2025-12-04T16:03:57.7890090Z Finished torch_np/test_ufuncs_basic 1/1 ... [2025-12-04 16:03:57.751104][24595.36101202], took 0.10min 2025-12-04T16:03:57.7894855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.xml 2025-12-04T16:03:57.8678471Z Running lazy/test_step_closures 1/1 ... [2025-12-04 16:03:57.867594][24595.477504012] 2025-12-04T16:03:57.8679176Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:03:57.8682168Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_step_closures.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:57.867932] 2025-12-04T16:04:04.8414570Z 2025-12-04T16:04:04.8415676Z lazy/test_step_closures 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_step_closures_1.1_f2cf8fda3341fdfb_.log 2025-12-04T16:04:04.8417859Z Running 4 items in this shard: test/lazy/test_step_closures.py::ClosuresTest::test_asynchronous, test/lazy/test_step_closures.py::ClosuresTest::test_asynchronous_exception, test/lazy/test_step_closures.py::ClosuresTest::test_synchronous, test/lazy/test_step_closures.py::ClosuresTest::test_synchronous_exception 2025-12-04T16:04:04.8419613Z 2025-12-04T16:04:04.8419954Z Finished lazy/test_step_closures 1/1 ... [2025-12-04 16:04:04.841265][24602.451174716], took 0.12min 2025-12-04T16:04:04.8802437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.xml 2025-12-04T16:04:04.9727604Z Running functorch/dim/test_getsetitem 1/1 ... [2025-12-04 16:04:04.972551][24602.582460879] 2025-12-04T16:04:04.9728208Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:04:04.9731474Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/dim/test_getsetitem.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:04:04.972920] 2025-12-04T16:04:09.9939602Z 2025-12-04T16:04:09.9940690Z functorch/dim/test_getsetitem 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.dim.test_getsetitem_1.1_f956801402f0c75a_.log 2025-12-04T16:04:09.9949276Z Running 19 items in this shard: test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_basic_dim_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_boolean_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_complex_mixed_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_device_handling_cpu, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_dim_pack_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_dimlist_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_edge_cases, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_ellipsis_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_error_conditions, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_inferred_dimension_binding, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_mixed_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_multiple_dim_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_none_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_repeated_dim_usage, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_slice_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_stride_calculation, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_tensor_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_unbound_dim_binding, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_unbound_dimlist_indexing 2025-12-04T16:04:09.9957088Z 2025-12-04T16:04:09.9957466Z Finished functorch/dim/test_getsetitem 1/1 ... [2025-12-04 16:04:09.993745][24607.603652977], took 0.08min 2025-12-04T16:04:10.0330097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.xml 2025-12-04T16:04:10.1398914Z Running test_fx 1/1 ... [2025-12-04 16:04:10.139676][24607.749585416] 2025-12-04T16:04:10.1399391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:04:10.1402989Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:04:10.140055] 2025-12-04T16:08:24.4304319Z 2025-12-04T16:08:24.4305154Z test_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.1_fe3aedf5a60597eb_.log 2025-12-04T16:08:24.4890025Z Running 1280 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_rand_like, test/test_fx.py::TestCSEPass::test_rand_n, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_simple_multiple_same_ops, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_dead_chain, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_dead_placeholder, test/test_fx.py::TestDCE::test_dead_placeholder_with_user, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_check_inline_non_const, test/test_fx.py::TestConstFold::test_check_inline_non_const_mult_return, test/test_fx.py::TestConstFold::test_check_skip_folding_quant_dequant_pattern, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_no_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_placeholder_reordered, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_multi_const_folded_attrs, test/test_fx.py::TestConstFold::test_const_fold_noop, test/test_fx.py::TestConstFold::test_const_fold_partial_graph, test/test_fx.py::TestConstFold::test_const_fold_submod_hierarchy, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_dict_output, test/test_fx.py::TestConstFold::test_do_not_fold_impure_subgraph, test/test_fx.py::TestConstFold::test_fold_module, test/test_fx.py::TestConstFold::test_fold_pure_subgraph, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_nelement_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_shape_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_annotate, test/test_fx.py::AnnotationsTest::test_annotations, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::AnnotationsTest::test_broadcasting3, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_false, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2_fully_static, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_flatten3, test/test_fx.py::TypeCheckerTest::test_type_check_flatten_2, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true_param_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_False, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TypeCheckerTest::test_typecheck_basicblock, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_module, test/test_fx.py::TestMatcher::test_split_to_graph_and_name_node_map, test/test_fx.py::TestMatcher::test_subgraph_matcher_ignore_literals, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestMatcher::test_variatic_arg_matching, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_matching_variable_arguments, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_filters, test/test_fx.py::TestSubgraphRewriter::test_replaced_nodes, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_call_method, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_local_revert, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_nodes_with_kwargs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_consecutive_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_multiple_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replaces_referenced_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_all_input_nodes, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_empty_tuple, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_no_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_args_kwargs_no_self, test/test_fx.py::TestFX::test_ast_rewriter_reassigns_submodules, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_fn_directly, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_ctx_mgr, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_import, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_custom_proxy_input_dependent_control_flow, test/test_fx.py::TestFX::test_custom_proxy_type, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_custom_traceback_raised_when_exception_source_is_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graphmodule_with_transform, test/test_fx.py::TestFX::test_deepcopy_no_recursion, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_direct_param_use, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_enum, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotation_empty, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_shifts, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_graph_unique_names, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_inf_nan_kwds, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_inline_graph, test/test_fx.py::TestFX::test_insert_arg, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_boxed_run_argument_validation, test/test_fx.py::TestFX::test_interpreter_default_args, test/test_fx.py::TestFX::test_interpreter_gc_values, test/test_fx.py::TestFX::test_interpreter_noop_resnet18, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_other_graph, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_run_node_override, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_lineno_map, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_metadata_on_ph, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_move_before, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_no_mutation, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_pickle_graphmodule, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pickle_torch_custom_ops, test/test_fx.py::TestFX::test_prepend_does_not_leak, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_graph, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_print_graph, test/test_fx.py::TestFX::test_profiler_multiple_modules, test/test_fx.py::TestFX::test_profiler_nested_graph_modules, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_profiler_stack_trace_augmentation, test/test_fx.py::TestFX::test_proxy_deepcopy_with_tracer, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_pytree, test/test_fx.py::TestFX::test_pytree_concrete, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_remove_uses_with_custom_filter, test/test_fx.py::TestFX::test_replace_input, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_return_tuple, test/test_fx.py::TestFX::test_return_type_exists, test/test_fx.py::TestFX::test_return_type_exists_pre_pep585, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_aggregate, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_shape_prop_layout_3d, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_stack_traces_with_transformer, test/test_fx.py::TestFX::test_string_literal_return, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_custom_ops, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torch_fx_len, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_buffer_slice, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_fn_constant, test/test_fx.py::TestFX::test_trace_function, test/test_fx.py::TestFX::test_trace_multiple_funcs, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_typename_print_union, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_unpack_dict_better_error, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_function, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_module, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap, test/test_fx.py::TestFX::test_wrap_decorated_function, test/test_fx.py::TestFX::test_wrap_fn_directly, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_method, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFX::test_wrong_topo, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_class_member_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy_with_logits, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding_bag, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gaussian_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grid_sample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_huber_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_instance_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_kl_div, test/test_fx.py::TestFunctionalTracing::test_nn_functional_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_layer_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_linear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_local_response_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_log_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mse_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pdist, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_poisson_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_prelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu6, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_smooth_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_nearest, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_H_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___getitem___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rdiv___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmod___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__chunk_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_allclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argwhere_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_baddbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_block_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bucketize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cfloat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_max_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_min_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_column_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_constant_pad_nd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_contiguous_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cov_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_deg2rad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagflat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_floor_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_double_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hash_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hypot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igammac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isfinite_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isnan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_item_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ldexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lerp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_householder_product_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_slogdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svdvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vander_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log10_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mT_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_maximum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_movedim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_msort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nan_to_num_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_glu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_selu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_silu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_fro_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rand_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_real_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_renorm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_interleave_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize_as__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rot90_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_short_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signbit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_airy_ai_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_entr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_erfcx_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_log_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtri_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_xlog1py_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_zeta_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_list_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_square_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensor_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_xlogy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_like_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet161, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet201, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_keypointrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssdlite320_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b4, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_m, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_75, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext50_32x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_1, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r2plus1d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r3d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2 2025-12-04T16:08:24.5459822Z 2025-12-04T16:08:24.5460114Z Finished test_fx 1/1 ... [2025-12-04 16:08:24.432331][24862.042235719], took 4.24min 2025-12-04T16:08:24.5461105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.xml 2025-12-04T16:08:24.5796081Z Running test_autocast 1/1 ... [2025-12-04 16:08:24.579363][24862.189272814] 2025-12-04T16:08:24.5796626Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:08:24.5799848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:24.579764] 2025-12-04T16:08:33.1058280Z 2025-12-04T16:08:33.1059180Z test_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autocast_1.1_7cd62703ceb14b05_.log 2025-12-04T16:08:33.1066904Z Running 20 items in this shard: test/test_autocast.py::TestAutocastCPU::test_autocast_disabled_with_fp32_dtype, test/test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_16, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_rnn, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_16, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_need_autocast_promote, test/test_autocast.py::TestAutocastCPU::test_cpu_autocast_deprecated_warning, test/test_autocast.py::TestAutocastCPU::test_generic_autocast, test/test_autocast.py::TestAutocastGPU::test_autocast_prioritize, test/test_autocast.py::TestAutocastGPU::test_cache_disabled, test/test_autocast.py::TestAutocastGPU::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_bfloat16_supported, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_error_message, test/test_autocast.py::TestTorchAutocast::test_autocast_fast_dtype, test/test_autocast.py::TestTorchAutocast::test_invalid_device, test/test_autocast.py::TestTorchAutocast::test_non_string_device 2025-12-04T16:08:33.1074177Z 2025-12-04T16:08:33.1074471Z Finished test_autocast 1/1 ... [2025-12-04 16:08:33.105650][24870.715558004], took 0.14min 2025-12-04T16:08:33.1459953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.xml 2025-12-04T16:08:33.3108290Z Running test_logging 1/1 ... [2025-12-04 16:08:33.310565][24870.920474056] 2025-12-04T16:08:33.3108818Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:08:33.3112567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:33.310977] 2025-12-04T16:08:40.5854058Z 2025-12-04T16:08:40.5855019Z test_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_logging_1.1_4a28eee8affd86e2_.log 2025-12-04T16:08:40.5856032Z Running 1 items in this shard: test/test_logging.py::LoggingTest::testApiUsage 2025-12-04T16:08:40.5856461Z 2025-12-04T16:08:40.5856758Z Finished test_logging 1/1 ... [2025-12-04 16:08:40.585233][24878.195140856], took 0.12min 2025-12-04T16:08:40.6256420Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.xml 2025-12-04T16:08:40.6919390Z Running test_python_dispatch 1/1 ... [2025-12-04 16:08:40.691657][24878.301565813] 2025-12-04T16:08:40.6919980Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:08:40.6922727Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_python_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:40.692033] 2025-12-04T16:08:52.9734709Z 2025-12-04T16:08:52.9735866Z test_python_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_python_dispatch_1.1_4a43d809046600b7_.log 2025-12-04T16:08:52.9790687Z Running 119 items in this shard: test/test_python_dispatch.py::TestDispatcherPythonBindings::test_call_boxed, test/test_python_dispatch.py::TestPythonRegistration::test_alias_analysis, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_no_existing, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_with_existing, test/test_python_dispatch.py::TestPythonRegistration::test_dispatcher_error_filenames, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_eq, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_pickle, test/test_python_dispatch.py::TestPythonRegistration::test_error_for_unsupported_ns_or_kind, test/test_python_dispatch.py::TestPythonRegistration::test_error_if_fn_not_callable, test/test_python_dispatch.py::TestPythonRegistration::test_extend_library_with_dispatch_key_arg, test/test_python_dispatch.py::TestPythonRegistration::test_fallback, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_keyset, test/test_python_dispatch.py::TestPythonRegistration::test_fallthrough_for_dense_key_with_meta_in_tls, test/test_python_dispatch.py::TestPythonRegistration::test_finalizer, test/test_python_dispatch.py::TestPythonRegistration::test_override_aten_ops_with_multiple_libraries, test/test_python_dispatch.py::TestPythonRegistration::test_override_cpu_sum, test/test_python_dispatch.py::TestPythonRegistration::test_override_cuda_with_jiterator, test/test_python_dispatch.py::TestPythonRegistration::test_register_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_returning_symint, test/test_python_dispatch.py::TestPythonDispatch::test_all_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_autograd_in_attr, test/test_python_dispatch.py::TestPythonDispatch::test_basic, test/test_python_dispatch.py::TestPythonDispatch::test_capture_logs_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_construct_int_tensor, test/test_python_dispatch.py::TestPythonDispatch::test_custom_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_not_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_size_policy_dynamic_shapes, test/test_python_dispatch.py::TestPythonDispatch::test_data_ptr_respects_numel_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_non_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass_with_clone_returning_different_type, test/test_python_dispatch.py::TestPythonDispatch::test_detach_appears_once_when_called_once, test/test_python_dispatch.py::TestPythonDispatch::test_device_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dim_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call_list_arg, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_dont_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_uint64, test/test_python_dispatch.py::TestPythonDispatch::test_error_using_class_method_on_mode, test/test_python_dispatch.py::TestPythonDispatch::test_exception_handling, test/test_python_dispatch.py::TestPythonDispatch::test_fancy_strides, test/test_python_dispatch.py::TestPythonDispatch::test_format, test/test_python_dispatch.py::TestPythonDispatch::test_get_cur_mode, test/test_python_dispatch.py::TestPythonDispatch::test_get_mode_stack, test/test_python_dispatch.py::TestPythonDispatch::test_index_put_where_only_index_is_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_invalid_ret, test/test_python_dispatch.py::TestPythonDispatch::test_is_contiguous_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only_and_positional_default, test/test_python_dispatch.py::TestPythonDispatch::test_layout_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_like, test/test_python_dispatch.py::TestPythonDispatch::test_list_ret, test/test_python_dispatch.py::TestPythonDispatch::test_make_fx_with_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_make_subclass_with_modes, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_noalloc, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_propagates_metadata, test/test_python_dispatch.py::TestPythonDispatch::test_maybe_tuple_bug, test/test_python_dispatch.py::TestPythonDispatch::test_mode_detection, test/test_python_dispatch.py::TestPythonDispatch::test_mode_with_make_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_multiple_ops_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_nested_push_logging_tensor_mode, test/test_python_dispatch.py::TestPythonDispatch::test_nesting_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_new_ones, test/test_python_dispatch.py::TestPythonDispatch::test_none_wrapping, test/test_python_dispatch.py::TestPythonDispatch::test_notimplemented_mode, test/test_python_dispatch.py::TestPythonDispatch::test_optional_tensor_list, test/test_python_dispatch.py::TestPythonDispatch::test_out, test/test_python_dispatch.py::TestPythonDispatch::test_produce_real_type, test/test_python_dispatch.py::TestPythonDispatch::test_record_stream, test/test_python_dispatch.py::TestPythonDispatch::test_return_and_correct_aliasing_gives_correct_stride, test/test_python_dispatch.py::TestPythonDispatch::test_return_stream, test/test_python_dispatch.py::TestPythonDispatch::test_set_data, test/test_python_dispatch.py::TestPythonDispatch::test_shallow_copy_and_detach, test/test_python_dispatch.py::TestPythonDispatch::test_sizes_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_standard_is_not_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_storage, test/test_python_dispatch.py::TestPythonDispatch::test_storage_can_be_converted_to_python_object, test/test_python_dispatch.py::TestPythonDispatch::test_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_creation, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_sym_sizes_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_tolist_numpy_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_basic, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_respects_no_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_unrelated_tensors, test/test_python_dispatch.py::TestPythonDispatch::test_version, test/test_python_dispatch.py::TestPythonDispatch::test_view_returns_alias_under_torch_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_with_mode_created_separately, test/test_python_dispatch.py::TestPythonDispatch::test_with_nested_modes, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_extra_dispatch_keys, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_multiprocessing_preserves_dtype, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_reentrant_dispatch_with_mode, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_serializes, test/test_python_dispatch.py::TestPythonDispatcher::test_basic, test/test_python_dispatch.py::TestPythonDispatcher::test_lstsq, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_cat_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_conv2d_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCatCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCubeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulScalarCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNMSCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNonzeroCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySortCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyTakeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyViewCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_fft_fft2_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_mul_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_native_batch_norm_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_out_op_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_list_args_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_view_cuda_float32 2025-12-04T16:08:52.9843780Z 2025-12-04T16:08:52.9844179Z Finished test_python_dispatch 1/1 ... [2025-12-04 16:08:52.973466][24890.583372604], took 0.20min 2025-12-04T16:08:53.0142764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.xml 2025-12-04T16:08:53.0870231Z Running nn/test_lazy_modules 1/1 ... [2025-12-04 16:08:53.086762][24890.696669953] 2025-12-04T16:08:53.0870782Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:08:53.0874034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_lazy_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:53.087181] 2025-12-04T16:09:00.6124373Z 2025-12-04T16:09:00.6125418Z nn/test_lazy_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_lazy_modules_1.1_641ede76abd1387b_.log 2025-12-04T16:09:00.6147852Z Running 59 items in this shard: test/nn/test_lazy_modules.py::TestLazyModules::test_chained_initialization, test/nn/test_lazy_modules.py::TestLazyModules::test_invalid_functions, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm_with_dict_input, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transposed1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_forward_hook, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_linear_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_linear_state_and_forward, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_jit_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_jit_param, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_parameter, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_pre_forward_hook, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_share_memory_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_share_memory_param, test/nn/test_lazy_modules.py::TestLazyModules::test_linear, test/nn/test_lazy_modules.py::TestLazyModules::test_linear_state, test/nn/test_lazy_modules.py::TestLazyModules::test_materialize_device, test/nn/test_lazy_modules.py::TestLazyModules::test_materialize_dtype, test/nn/test_lazy_modules.py::TestLazyModules::test_optimizer_pass, test/nn/test_lazy_modules.py::TestLazyModules::test_spectral_norm, test/nn/test_lazy_modules.py::TestLazyModules::test_weight_norm 2025-12-04T16:09:00.6169373Z 2025-12-04T16:09:00.6169701Z Finished nn/test_lazy_modules 1/1 ... [2025-12-04 16:09:00.612299][24898.222207809], took 0.13min 2025-12-04T16:09:00.6536884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.xml 2025-12-04T16:09:00.7620411Z Running nn/test_pruning 1/1 ... [2025-12-04 16:09:00.761781][24898.371689445] 2025-12-04T16:09:00.7620934Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:00.7624365Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:00.762178] 2025-12-04T16:09:06.1844050Z 2025-12-04T16:09:06.1844989Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_fc4532e556fbe9d9_.log 2025-12-04T16:09:06.1857582Z Running 34 items in this shard: test/nn/test_pruning.py::TestPruningNN::test_compute_nparams_to_prune, test/nn/test_pruning.py::TestPruningNN::test_custom_from_mask_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_identity_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning_with_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_multiple_pruning_calls, test/nn/test_pruning.py::TestPruningNN::test_prune, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores_mimic_default, test/nn/test_pruning.py::TestPruningNN::test_pruning_container, test/nn/test_pruning.py::TestPruningNN::test_pruning_container_compute_mask, test/nn/test_pruning.py::TestPruningNN::test_pruning_id_consistency, test/nn/test_pruning.py::TestPruningNN::test_pruning_rollback, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_model, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_state_dict, test/nn/test_pruning.py::TestPruningNN::test_random_pruning, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_0perc, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_new_weight, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_orig, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_pickle, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_sizes, test/nn/test_pruning.py::TestPruningNN::test_random_structured_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_exception, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_rnn_pruning, test/nn/test_pruning.py::TestPruningNN::test_unstructured_pruning_same_magnitude, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount_init 2025-12-04T16:09:06.1869596Z 2025-12-04T16:09:06.1869908Z Finished nn/test_pruning 1/1 ... [2025-12-04 16:09:06.184249][24903.794157783], took 0.09min 2025-12-04T16:09:06.2253781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.xml 2025-12-04T16:09:06.2719161Z Running test_monitor 1/1 ... [2025-12-04 16:09:06.271644][24903.88155216] 2025-12-04T16:09:06.2719685Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:06.2723002Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_monitor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:06.272033] 2025-12-04T16:09:11.7942491Z 2025-12-04T16:09:11.7943409Z test_monitor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_monitor_1.1_60acff8e80cf96a3_.log 2025-12-04T16:09:11.7945768Z Running 6 items in this shard: test/test_monitor.py::TestMonitor::test_event_handler, test/test_monitor.py::TestMonitor::test_fixed_count_stat, test/test_monitor.py::TestMonitor::test_interval_stat, test/test_monitor.py::TestMonitor::test_log_event, test/test_monitor.py::TestMonitor::test_wait_counter, test/test_monitor.py::TestMonitorTensorboard::test_event_handler 2025-12-04T16:09:11.7947551Z 2025-12-04T16:09:11.7947835Z Finished test_monitor 1/1 ... [2025-12-04 16:09:11.794058][24909.403967592], took 0.09min 2025-12-04T16:09:11.8352709Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.xml 2025-12-04T16:09:11.8791658Z Running test_cuda_sanitizer 1/1 ... [2025-12-04 16:09:11.878949][24909.488857898] 2025-12-04T16:09:11.8792196Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:11.8795907Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_sanitizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:11.879344] 2025-12-04T16:09:19.1537487Z 2025-12-04T16:09:19.1538532Z test_cuda_sanitizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_sanitizer_1.1_06ff5e3bcde71deb_.log 2025-12-04T16:09:19.1549874Z Running 31 items in this shard: test/test_cuda_sanitizer.py::TestArgumentHandler::test_add, test/test_cuda_sanitizer.py::TestArgumentHandler::test_cat, test/test_cuda_sanitizer.py::TestArgumentHandler::test_inplace, test/test_cuda_sanitizer.py::TestArgumentHandler::test_nonzero, test/test_cuda_sanitizer.py::TestArgumentHandler::test_out, test/test_cuda_sanitizer.py::TestArgumentHandler::test_split, test/test_cuda_sanitizer.py::TestArgumentHandler::test_tensor_names, test/test_cuda_sanitizer.py::TestEventHandler::test_all_reads_checked_failing, test/test_cuda_sanitizer.py::TestEventHandler::test_all_reads_checked_passing, test/test_cuda_sanitizer.py::TestEventHandler::test_branch_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_chain_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_correct_state_merging, test/test_cuda_sanitizer.py::TestEventHandler::test_deleted_record, test/test_cuda_sanitizer.py::TestEventHandler::test_device_synchronization_expired, test/test_cuda_sanitizer.py::TestEventHandler::test_device_synchronize, test/test_cuda_sanitizer.py::TestEventHandler::test_empty_kernel_launch, test/test_cuda_sanitizer.py::TestEventHandler::test_event_synchronize, test/test_cuda_sanitizer.py::TestEventHandler::test_expired_record, test/test_cuda_sanitizer.py::TestEventHandler::test_multiple_errors, test/test_cuda_sanitizer.py::TestEventHandler::test_multiple_wait, test/test_cuda_sanitizer.py::TestEventHandler::test_new_stream_is_synchronized, test/test_cuda_sanitizer.py::TestEventHandler::test_reads_check_last_write, test/test_cuda_sanitizer.py::TestEventHandler::test_record_override, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_error, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_passing, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_stream_synchronize, test/test_cuda_sanitizer.py::TestMessages::test_ensure_does_not_exist, test/test_cuda_sanitizer.py::TestMessages::test_ensure_exists, test/test_cuda_sanitizer.py::TestMessages::test_error_message, test/test_cuda_sanitizer.py::TestMessages::test_subclass 2025-12-04T16:09:19.1560473Z 2025-12-04T16:09:19.1560817Z Finished test_cuda_sanitizer 1/1 ... [2025-12-04 16:09:19.153598][24916.763505435], took 0.12min 2025-12-04T16:09:19.1950515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.xml 2025-12-04T16:09:19.2776051Z Running test_bundled_inputs 1/1 ... [2025-12-04 16:09:19.277309][24916.887217795] 2025-12-04T16:09:19.2776609Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:19.2779464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_bundled_inputs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:19.277709] 2025-12-04T16:09:25.4516949Z 2025-12-04T16:09:25.4517912Z test_bundled_inputs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_bundled_inputs_1.1_395d728a16287961_.log 2025-12-04T16:09:25.4523600Z Running 12 items in this shard: test/test_bundled_inputs.py::TestBundledInputs::test_bad_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_dict_args, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_fail, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_non_mutator, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_success, test/test_bundled_inputs.py::TestBundledInputs::test_large_tensor_with_inflation, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_both_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_neither_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_non_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_rejected_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_single_tensors 2025-12-04T16:09:25.4528471Z 2025-12-04T16:09:25.4528797Z Finished test_bundled_inputs 1/1 ... [2025-12-04 16:09:25.451548][24923.061456647], took 0.10min 2025-12-04T16:09:25.4934070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.xml 2025-12-04T16:09:25.5752465Z Running torch_np/numpy_tests/core/test_numeric 1/1 ... [2025-12-04 16:09:25.574957][24923.184864927] 2025-12-04T16:09:25.5753136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:25.5756592Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_numeric.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:25.575415] 2025-12-04T16:09:35.4033612Z 2025-12-04T16:09:35.4034848Z torch_np/numpy_tests/core/test_numeric 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_numeric_1.1_c2ce2dbd13566161_.log 2025-12-04T16:09:35.4150979Z Running 273 items in this shard: test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_copies, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_negative_resize, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_repeats, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_reshape_from_zero, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_zeroresize, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_choose, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_clip, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_compress, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_count_nonzero, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_cumproduct, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_diagonal, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_accuracy, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype2, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype3, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype4, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype5, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype6, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype7, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-1, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-10, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-9, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_mean, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_prod, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_ptp, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_ravel, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_repeat, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_reshape, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round_2, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round_py_consistency, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_searchsorted, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_size, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_squeeze, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_std, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_sum, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_swapaxes, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_take, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_trace, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_transpose, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_var, test/torch_np/numpy_tests/core/test_numeric.py::TestIsscalar::test_isscalar, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_and_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_and_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_or_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_or_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_xor_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_xor_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_logical, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_all_any, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_logical_and_or_xor, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_logical_not_abs, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolCmp::test_double, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolCmp::test_float, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_default, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_divide_err, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_errobj, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_set, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_D, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_F, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_d, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_e, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_f, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_warnings, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast_2, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast_values, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_coercion, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_coercion_2, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_promote_types_endian, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_result_type, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_tesult_type_2, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_2592_dtype0_count_10_error_index_5, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_2592_dtype0_count_10_error_index_9, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_empty_result, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_failed_itemsetting, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_lengths, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_too_few_items, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_types, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_values, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_?, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_B, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_D, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_F, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_b, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_d, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_e, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_f, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_h, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_i, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_l, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_list, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_countnonzero_axis_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_countnonzero_keepdims, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_onedim, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_onedim_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_trivial, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_trivial_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_twodim, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_zerod, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_zerod_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_sparse, test/torch_np/numpy_tests/core/test_numeric.py::TestIndex::test_boolean, test/torch_np/numpy_tests/core/test_numeric.py::TestIndex::test_boolean_edgecase, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_large_neg_int64, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_neg_width_boundaries, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_negative, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_positive, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_sufficient_width, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_zero, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_base3, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_base_range, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_negative, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_positive, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equal, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equal_equal_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equiv, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_none_compares_elementwise, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_array_double, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_func_takes_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_inplace_array, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_inplace_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_non_contig, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_property, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_scalar_nan_propagation_arr0_amin0_amax0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin2_amax2, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin_1_amax1, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin_1_amax_0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_array_int32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_array_outint32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_memory_overlap, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple2, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple_int32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_transposed, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_noncontig_inplace, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_D, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_F, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_e, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_?, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_B, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_b, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_d, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_f, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_h, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_i, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_l, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_double, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_inplace_01, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_inplace_02, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_inout_casting0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_inout_casting_unsafe, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int64_inout, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int64_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_nonnative, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_01, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_02, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_03, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_04, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_05, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_06, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_07, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_08, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_09, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_10, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_11, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_12, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_equalnan, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_ip_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_ip_not_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_min_int, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_no_parameter_modification, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_equal_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_all_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_isclose_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_none_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_no_parameter_modification, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_non_finite_scalar, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_scalar_return, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_basic, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_ddof1, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_ddof2, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_out_scalar, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_scalars, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVarComplex::test_basic, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVarComplex::test_scalars, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_for_reference_leak, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_full, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_ones, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_zeros, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc0_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc0_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc1_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc1_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc2_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc2_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc3_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc3_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_empty_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_filled_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_ones_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_zeros_like, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_float, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_mode, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_no_overwrite, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_zero_size, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_mode, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_no_overwrite, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_numpy_doc_examples, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_object, test/torch_np/numpy_tests/core/test_numeric.py::TestDtypePositional::test_dtype_positional, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_2D, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_list, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_0, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_1, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_2, test/torch_np/numpy_tests/core/test_numeric.py::TestStringFunction::test_set_string_function, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll1d, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll2d, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestRollaxis::test_exceptions, test/torch_np/numpy_tests/core/test_numeric.py::TestRollaxis::test_results, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_errors, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_multiples, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_new_position, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_to_end, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_preserve_order, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_2x2, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_2x3, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_3x3, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_broadcasting, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_broadcasting_shapes, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_uint8_int32_mixed_dtypes, test/torch_np/numpy_tests/core/test_numeric.py::TestOuterMisc::test_outer_out_param, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_scalar_input, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_single_input, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_sparse, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_C_and_F_simul, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_non_array_input, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_require_each, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_unknown_requirement, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_error_kwargs, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_in_args, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_single_arg, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_number_of_arguments, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_shape_mismatch_error_message, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimension, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimension_einsum, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimensional 2025-12-04T16:09:35.4265500Z 2025-12-04T16:09:35.4265930Z Finished torch_np/numpy_tests/core/test_numeric 1/1 ... [2025-12-04 16:09:35.403600][24933.013507384], took 0.16min 2025-12-04T16:09:35.4452979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.xml 2025-12-04T16:09:35.5310561Z Running torch_np/numpy_tests/core/test_multiarray 1/1 ... [2025-12-04 16:09:35.530816][24933.140725026] 2025-12-04T16:09:35.5311211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:09:35.5314885Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:35.531226] 2025-12-04T16:10:09.9430715Z 2025-12-04T16:10:09.9432054Z torch_np/numpy_tests/core/test_multiarray 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.1_f5a85c7d65f3960a_.log 2025-12-04T16:10:09.9860071Z Running 864 items in this shard: test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_otherflags, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag__warn_on_write_flag_value_True_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_False_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_True_writeable_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_string_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_void_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_warnonwrite, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_any_base, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_buffer, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_pickle, test/torch_np/numpy_tests/core/test_multiarray.py::TestHash::test_int, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_dtypeattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_max_uint64, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_struct_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_set_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_0d_array_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_cont, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_broadcasting, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_cast_to_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_longdouble_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_stringlike_empty_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_unicode_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestDtypedescr::test_construction, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_ellipsis_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_empty_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_overlapping_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_of_ragged_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_deep_nonragged_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_empty_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_failed_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_iterable, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_attribute, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_malloc_fails, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_no_len_object_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_non_sequence_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_ndim_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_shape_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_sequence_non_homogeneous, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_arr, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_too_big_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_like_like_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_bytes, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_unaligned, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_test_interning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__should_not_work, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__deepcopy___dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_all_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_any_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_compress, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_copy, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_view_notwriteable, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot_out_mem_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_matmul_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_fuzz, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_iterative, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_prod, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_put, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_ravel, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_repeat, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_reshape, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_round, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_default_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f16, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f32, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_n_elements, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_resetting, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_unaligned_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_invalid_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_size_zero_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_nans, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_degraded, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_size_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_squeeze, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_swapaxes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_trace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_transpose, test/torch_np/numpy_tests/core/test_multiarray.py::TestCequenceMethods::test_array_contains, test/torch_np/numpy_tests/core/test_multiarray.py::TestBinop::test_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestSubscripting::test_test_zero_rank, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_tuple, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_maximum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_minimum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestNewaxis::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_max_or_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_truncate, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_mask_size, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_overlaps, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_record_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_clip, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_out_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape0, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape1, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape2, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_wrap, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype7, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_datetime, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_invalid_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_mixed, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_ascii, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_big_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_bool_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_text, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_tofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_bad_dup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_offset, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_subarray_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromstring_count0, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_inf, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_int64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_buffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_unbuffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_largish_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_load_object_array_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_long_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_malformed, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_numbers, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_parsing_subarray_unsupported, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_read_shorter_than_count_subarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_binary_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_dump_pathlib, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_repr, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_cleanup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_uint64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_unseekable_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj_12345678, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_mmap_close, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_0d_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_reference, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_weakref, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_empty_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_freeform_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_int_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_invalid_arguments, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_none_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_zeros_appended, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_input, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_keepdims, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_float16, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_python_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_byteorder, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex128_ndec_7, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex64_ndec_6, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_dimensions, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_accelerate_framework_sgemv_fix, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_2args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect1, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat3, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecinner, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecouter, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv11, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv12, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN7, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN8, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN9, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvn10, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_empty_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_add, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_multiply, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_arg, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_array_priority_override, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_axes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_raises, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_3d_tensor, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_reversed_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_with_various_contiguities, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_scalar_and_vector, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_vecself, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops0, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops3, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_axis_spec, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWarnings::test_complex_warning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_float, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_nonscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_usigned_shortshort, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_byteorder_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_char_vs_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_field_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_intra_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_padding_with_array_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_trailing_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_unnamed_fields, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test___array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_array_interfaces, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_buffer_interface, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_compatible_cast, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_scalars, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_striding_not_ok, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_not_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_not_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestDelMisc::test_flat_element_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_array_scalar_relational_operation, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_bool_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_dtype_mix, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_empty_result, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_foreign, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_largedim, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_ndim, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_arrays_not_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_collections_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_0d, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_no_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmax_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmin_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_choose_mod_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_dot_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_flatiter__array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_insert_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_put_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_putmask_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_take_mode_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_arange_booleans, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_infinite, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_nan_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_start_stop_kwarg, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_zero_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestRichcompareScalar::test_richcompare_scalar_boolean_singleton_return, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_1023, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_128, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_151, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_16, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_191, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_2047, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_24, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_256, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_32, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_383, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_48, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_512, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_64, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_8, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_96, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_int 2025-12-04T16:10:10.0275692Z 2025-12-04T16:10:10.0276171Z Finished torch_np/numpy_tests/core/test_multiarray 1/1 ... [2025-12-04 16:10:09.944262][24967.554168142], took 0.57min 2025-12-04T16:10:10.0277723Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.xml 2025-12-04T16:10:10.0813633Z Running test_itt 1/1 ... [2025-12-04 16:10:10.081057][24967.690964227] 2025-12-04T16:10:10.0814146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:10:10.0817471Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_itt.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:10.081494] 2025-12-04T16:10:15.4536018Z 2025-12-04T16:10:15.4536904Z test_itt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_itt_1.1_0c67806275155360_.log 2025-12-04T16:10:15.4538042Z Running 1 items in this shard: test/test_itt.py::TestItt::test_itt 2025-12-04T16:10:15.4538433Z 2025-12-04T16:10:15.4538693Z Finished test_itt 1/1 ... [2025-12-04 16:10:15.453395][24973.063305367], took 0.09min 2025-12-04T16:10:15.4959467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.xml 2025-12-04T16:10:15.5192739Z Running torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 16:10:15.518995][24973.128903197] 2025-12-04T16:10:15.5193409Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:10:15.5196840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:15.519423] 2025-12-04T16:10:23.3949570Z 2025-12-04T16:10:23.3950975Z torch_np/numpy_tests/lib/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_function_base_1.1_66e1a2bc19dbe7b5_.log 2025-12-04T16:10:23.4203395Z Running 505 items in this shard: test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_rotation_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_4d, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_lr, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_ud, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_default_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_multiple_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_average_class_without_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x0_axis0_expected_avg0_weights0_expected_wavg0_expected_wsum0, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x1_axis_0_expected_avg1_weights1_expected_wavg1_expected_wsum1, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_returned, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_upcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_broadcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_deprecated_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_many_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_non_bool_deprecation, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_return_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_array_copied, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_-4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_multidim, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmax::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmin::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPtp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumsum::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestProd::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumprod::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_append, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_n, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_prepend, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_array_order_preserve, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_fancy, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_[1], test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_array([1]), test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_non_int, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_slices, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_args, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_badargs, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_decreasing_unsigned_int_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_inexact_dtypes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_second_order_accurate, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_spacing, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_specific_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_decreasing_unsigned_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestAngle::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_all_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_leading_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_list_to_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_no_trim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_overflow_arr0, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_size_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_trailing_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_both, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_place, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_casting_error, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_forward, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_decreasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_increasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_monotonic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_ndim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_dtype_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_bias, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_extreme, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_non_array, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_rowvar, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_variance, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_fweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_unit_fweights_and_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_wrong_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_int_beta, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMsort::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_invalid_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_shape, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_no_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_return_type, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_single_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_sparse, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_writeback, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_0d_condition, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_comparison, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_default, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_multidimensional_extrafunc, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_scalar_domains_three_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_two_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_dtype_reference_leaks, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals0, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_incorrect_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_and_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_smaller_than_maxvalue, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_complex_interp, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_exceptions, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_if_len_x_is_small, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_behavior_exact_x, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_period, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_right_left_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_scalar_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_zero_dimensional_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_2D, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_api, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_exception, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_extrapolation, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_empty_dim, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_no_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_sequence, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_correct_quantile_value, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_max_ulp, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_hypo, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_averaged_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_closest_observation, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_hazen, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_higher, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_interpolated_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_linear, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_lower, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_median_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_midpoint, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_nearest, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_normal_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_weibull, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_scalar_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_axis_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_overwrite_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_B_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_H_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_b_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_g_type_out_G, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_h_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_l_type_out_D 2025-12-04T16:10:23.4451025Z 2025-12-04T16:10:23.4451514Z Finished torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 16:10:23.395619][24981.005524864], took 0.13min 2025-12-04T16:10:23.4453074Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.xml 2025-12-04T16:10:23.5417534Z Running test_masked 1/1 ... [2025-12-04 16:10:23.541442][24981.151348803] 2025-12-04T16:10:23.5418066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:10:23.5421573Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_masked.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:23.541888] 2025-12-04T16:10:59.0552801Z 2025-12-04T16:10:59.0553800Z test_masked 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_masked_1.1_f4f98418cc401a0c_.log 2025-12-04T16:10:59.0636278Z Running 194 items in this shard: test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_123_cuda 2025-12-04T16:10:59.0716093Z 2025-12-04T16:10:59.0716398Z Finished test_masked 1/1 ... [2025-12-04 16:10:59.055443][25016.665349331], took 0.59min 2025-12-04T16:10:59.0988297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.xml 2025-12-04T16:10:59.1850030Z Running optim/test_lrscheduler 1/1 ... [2025-12-04 16:10:59.184720][25016.794627735] 2025-12-04T16:10:59.1850612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:10:59.1853895Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_lrscheduler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:59.185153] 2025-12-04T16:11:03.8578400Z 2025-12-04T16:11:03.8579422Z optim/test_lrscheduler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_lrscheduler_1.1_50b469a96bd12a6b_.log 2025-12-04T16:11:03.8580227Z 2025-12-04T16:11:03.8580594Z Finished optim/test_lrscheduler 1/1 ... [2025-12-04 16:11:03.857627][25021.467534489], took 0.08min 2025-12-04T16:11:03.9006730Z Running test_datapipe 1/1 ... [2025-12-04 16:11:03.900422][25021.510330343] 2025-12-04T16:11:03.9007267Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:11:03.9011318Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_datapipe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:11:03.900829] 2025-12-04T16:11:25.8454842Z 2025-12-04T16:11:25.8455801Z test_datapipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_datapipe_1.1_628e5e9adba39130_.log 2025-12-04T16:11:25.8491695Z Running 93 items in this shard: test/test_datapipe.py::TestDataChunk::test_as_string, test/test_datapipe.py::TestDataChunk::test_getitem, test/test_datapipe.py::TestDataChunk::test_iter, test/test_datapipe.py::TestDataChunk::test_len, test/test_datapipe.py::TestDataChunk::test_random_shuffle, test/test_datapipe.py::TestDataChunk::test_reverse, test/test_datapipe.py::TestDataChunk::test_sort, test/test_datapipe.py::TestStreamWrapper::test_api, test/test_datapipe.py::TestStreamWrapper::test_dir, test/test_datapipe.py::TestStreamWrapper::test_pickle, test/test_datapipe.py::TestStreamWrapper::test_repr, test/test_datapipe.py::TestIterableDataPipeBasic::test_demux_mux_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_groupby_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfiles_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfilesdeterministic_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_map_with_col_file_handle_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_openfilesfromdisk_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_routeddecoder_iterable_datapipe, test/test_datapipe.py::TestCaptureDataFrame::test_basic_capture, test/test_datapipe.py::TestDataFramesPipes::test_batch, test/test_datapipe.py::TestDataFramesPipes::test_capture, test/test_datapipe.py::TestDataFramesPipes::test_collate, test/test_datapipe.py::TestDataFramesPipes::test_filter, test/test_datapipe.py::TestDataFramesPipes::test_shuffle, test/test_datapipe.py::TestDataFramesPipes::test_unbatch, test/test_datapipe.py::TestFunctionalIterDataPipe::test_batch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_collate_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_concat_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_demux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalIterDataPipe::test_filter_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_fork_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_iterable_wrapper_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_dict_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_tuple_list_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_mux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_sampler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalIterDataPipe::test_shuffler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_stream_reader_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_unbatch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_zip_iterdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_batch_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_concat_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalMapDataPipe::test_map_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_sequence_wrapper_datapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalMapDataPipe::test_shuffler_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_zip_mapdatapipe, test/test_datapipe.py::TestTyping::test_compile_time, test/test_datapipe.py::TestTyping::test_construct_time, test/test_datapipe.py::TestTyping::test_isinstance, test/test_datapipe.py::TestTyping::test_issubinstance, test/test_datapipe.py::TestTyping::test_protocol, test/test_datapipe.py::TestTyping::test_reinforce, test/test_datapipe.py::TestTyping::test_runtime, test/test_datapipe.py::TestTyping::test_subtype, test/test_datapipe.py::TestGraph::test_simple_traverse, test/test_datapipe.py::TestGraph::test_traverse_circular_datapipe, test/test_datapipe.py::TestGraph::test_traverse_forked, test/test_datapipe.py::TestGraph::test_traverse_mapdatapipe, test/test_datapipe.py::TestGraph::test_traverse_mixdatapipe, test/test_datapipe.py::TestGraph::test_traverse_unhashable_datapipe, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_iter, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_map, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_dill, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_pickle, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding_with_old_dataloader, test/test_datapipe.py::TestSharding::test_multi_sharding, test/test_datapipe.py::TestSharding::test_old_dataloader, test/test_datapipe.py::TestSharding::test_sharding_groups, test/test_datapipe.py::TestSharding::test_sharding_length, test/test_datapipe.py::TestSharding::test_simple_sharding, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_buggy, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_constraint_multiple_outputs, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_generator, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_new_object, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_self_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_return_self, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_non_generator, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_self_next, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_repeated, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_with_serialization 2025-12-04T16:11:25.8527107Z 2025-12-04T16:11:25.8527400Z Finished test_datapipe 1/1 ... [2025-12-04 16:11:25.845371][25043.455281121], took 0.37min 2025-12-04T16:11:25.8880824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.xml 2025-12-04T16:11:25.9662386Z Running nn/test_convolution 1/1 ... [2025-12-04 16:11:25.965977][25043.575885447] 2025-12-04T16:11:25.9662927Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:11:25.9666243Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_convolution.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:11:25.966381] 2025-12-04T16:12:11.7521007Z 2025-12-04T16:12:11.7524083Z nn/test_convolution 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_convolution_1.1_d98f421ddfbea09e_.log 2025-12-04T16:12:11.7939931Z Running 606 items in this shard: test/nn/test_convolution.py::TestConvolutionNN::test_Conv1d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_1x1, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_OneDNN, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias_v2, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_with_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_without_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_missing_argument, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_wbias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_half_cublas_gemm, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size_downsample_upsample, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose3d_correct_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv1d_issue_120547, test/nn/test_convolution.py::TestConvolutionNN::test_conv2d_discontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_issue_120406, test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values, test/nn/test_convolution.py::TestConvolutionNN::test_conv_aten_invalid_groups, test/nn/test_convolution.py::TestConvolutionNN::test_conv_backcompat, test/nn/test_convolution.py::TestConvolutionNN::test_conv_cudnn_memory_layout_dominance, test/nn/test_convolution.py::TestConvolutionNN::test_conv_invalid_groups, test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv_padding_mode, test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck, test/nn/test_convolution.py::TestConvolutionNN::test_conv_tbc, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_non_contiguous, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_noncontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_not_mutate_stride, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grouped_conv_cudnn_nhwc_support, test/nn/test_convolution.py::TestConvolutionNN::test_huge_padding, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d, test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_nnpack_conv, test/nn/test_convolution.py::TestConvolutionNN::test_permute_conv2d_issue_120211, test/nn/test_convolution.py::TestConvolutionNN::test_thnn_conv_strided_padded_dilated, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose3d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_contig_wrong_stride_cudnn_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_no_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_cudnn_broken_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_contiguous_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_mismatch_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_groups_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_no_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_stride_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_strided_with_3D_input_and_weight_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_ic1_channels_last_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_batch_1_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_nosplit_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_and_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose2d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose3d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transposed_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv2d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv3d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_depthwise_conv_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_conv_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float64 2025-12-04T16:12:11.8350007Z 2025-12-04T16:12:11.8350380Z Finished nn/test_convolution 1/1 ... [2025-12-04 16:12:11.753100][25089.363006733], took 0.76min 2025-12-04T16:12:11.8351579Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.xml 2025-12-04T16:12:11.9335512Z Running test_indexing 1/1 ... [2025-12-04 16:12:11.933279][25089.543186008] 2025-12-04T16:12:11.9337817Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:12:11.9339534Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:11.933705] 2025-12-04T16:12:40.8373849Z 2025-12-04T16:12:40.8374805Z test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_indexing_1.1_2824065dc4dc1509_.log 2025-12-04T16:12:40.8443630Z Running 186 items in this shard: test/test_indexing.py::TestIndexingCUDA::test_advancedindex_big_cuda, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_basic_advanced_combined_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_tensor_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_cpu_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_cuda_broadcast_index_use_deterministic_algorithms_cuda, test/test_indexing.py::TestIndexingCUDA::test_ellipsis_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_bool_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_slice_cuda, test/test_indexing.py::TestIndexingCUDA::test_errors_index_copy_cuda, test/test_indexing.py::TestIndexingCUDA::test_gather_take_along_dim_cross_device_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_getitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_add_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_getitem_copy_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_ind_dtype_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_limits_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_duplicate_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_empty_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_expanded_values_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_large_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_non_contiguous_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_deterministic_with_optional_tensors_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_large_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_non_accumulate_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_scalar_with_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_setitem_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_int_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_broadcast_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_device_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_jit_indexing_cuda, test/test_indexing.py::TestIndexingCUDA::test_list_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_none_cuda, test/test_indexing.py::TestIndexingCUDA::test_out_of_bound_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_set_item_to_scalar_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_expansion_error_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_single_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_cuda, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_unravel_index_errors_cuda, test/test_indexing.py::TestIndexingCUDA::test_variable_slicing_cuda, test/test_indexing.py::TestIndexingCUDA::test_zero_dim_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_assignment_value_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_alldims_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_onedim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_twodim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_tensors_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_list_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_shape_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broadcast_subspace_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broaderrors_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_ellipsis_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_fancy_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_tuple_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_everything_returns_views_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_is_larger_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_no_floats_cuda, test/test_indexing.py::NumpyTestsCUDA::test_none_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_bool_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_int_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_trivial_fancy_out_of_bounds_cuda, test/test_indexing.py::NumpyTestsCUDA::test_truncate_leading_1s_cuda 2025-12-04T16:12:40.8511071Z 2025-12-04T16:12:40.8511372Z Finished test_indexing 1/1 ... [2025-12-04 16:12:40.837460][25118.447367855], took 0.48min 2025-12-04T16:12:40.8799781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.xml 2025-12-04T16:12:40.9790836Z Running torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 16:12:40.978776][25118.588683517] 2025-12-04T16:12:40.9791711Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:12:40.9794507Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_pocketfft.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:40.979191] 2025-12-04T16:12:49.3062228Z 2025-12-04T16:12:49.3063516Z torch_np/numpy_tests/fft/test_pocketfft 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_5bba81624a9a4669_.log 2025-12-04T16:12:49.3100387Z Running 79 items in this shard: test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTShift::test_fft_n, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_all_1d_norm_preserving, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_hfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_identity, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_backward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_forward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_ortho, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ihfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_ifft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_rfft 2025-12-04T16:12:49.3136941Z 2025-12-04T16:12:49.3137376Z Finished torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 16:12:49.306159][25126.916065829], took 0.14min 2025-12-04T16:12:49.3493214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.xml 2025-12-04T16:12:49.4339597Z Running torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 16:12:49.433621][25127.043529022] 2025-12-04T16:12:49.4340261Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:12:49.4343308Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_shape_base_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:49.434054] 2025-12-04T16:12:55.3567367Z 2025-12-04T16:12:55.3568856Z torch_np/numpy_tests/lib/test_shape_base_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_shape_base__1.1_462d874ba4c079f0_.log 2025-12-04T16:12:55.3599359Z Running 73 items in this shard: test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_argequivalent, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_invalid, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_replace_max, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_0d_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_3d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion_ma, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_scalar_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple101, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_tuple_func1d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_with_iterable_object, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyOverAxes::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_out_of_range, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_tuple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_functionality, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_repeated_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_high_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_low_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_0_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_cols, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_default, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows_greater_max_int32, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_equal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_unequal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_1D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_2D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_3D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic_2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis_handling, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_contiguous, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_type, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a0_shape_b0, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a1_shape_b1, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a2_shape_b2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a3_shape_b3, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a4_shape_b4, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a5_shape_b5, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_kroncompare, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_tile_one_repetition_on_array_gh4679, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestMayShareMemory::test_basic 2025-12-04T16:12:55.3629254Z 2025-12-04T16:12:55.3629676Z Finished torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 16:12:55.356642][25132.9665502], took 0.10min 2025-12-04T16:12:55.4003272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.xml 2025-12-04T16:12:55.4352201Z Running test_cpp_extensions_jit 1/1 ... [2025-12-04 16:12:55.434989][25133.044897475] 2025-12-04T16:12:55.4352765Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:12:55.4356070Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_jit.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:55.435369] 2025-12-04T16:16:42.9165179Z 2025-12-04T16:16:42.9166228Z test_cpp_extensions_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_jit_1.1_53eadff4adfe6cf3_.log 2025-12-04T16:16:42.9184857Z Running 35 items in this shard: test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_aoti_torch_call_dispatcher, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_autograd_from_cpp, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_compilation_error_formatting, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_same_output_as_python, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_up_to_date_attributes, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op_with_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_non_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_pluggable_allocator_include, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_compound_op_autograd, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_functorch_error, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_gen_extension_h_pch, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_half_support, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_custom_op_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_multiple_sources_and_no_functions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_throws_when_functions_is_bad, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_dict, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_list, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_xpu, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_compile_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_archflags, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cudnn_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_archlists, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_lenient_flag_handling_in_jit_extensions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_load_with_non_platform_default_encoding, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_mps_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_pch_command_injection, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_reload_jit_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_returns_shared_library_path_when_is_python_module_is_true, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_set_default_type_also_changes_aten_default_type, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_warning 2025-12-04T16:16:42.9200791Z 2025-12-04T16:16:42.9201316Z Finished test_cpp_extensions_jit 1/1 ... [2025-12-04 16:16:42.916366][25360.526274638], took 3.79min 2025-12-04T16:16:42.9601827Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.xml 2025-12-04T16:16:44.5222298Z Uploading artifacts took 1.49 seconds 2025-12-04T16:16:44.5226841Z Running profiler/test_python_tracer 1/1 ... [2025-12-04 16:16:44.522489][25362.132397219] 2025-12-04T16:16:44.5227475Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:16:44.5232234Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:16:44.522960] 2025-12-04T16:16:55.1018689Z 2025-12-04T16:16:55.1019824Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_2f036554f4a33837_.log 2025-12-04T16:16:55.1022013Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events 2025-12-04T16:16:55.1023684Z 2025-12-04T16:16:55.1024069Z Finished profiler/test_python_tracer 1/1 ... [2025-12-04 16:16:55.101665][25372.711574683], took 0.18min 2025-12-04T16:16:55.1450409Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.xml 2025-12-04T16:16:55.2549110Z Running cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 ... [2025-12-04 16:16:55.254550][25372.864459302] 2025-12-04T16:16:55.2549972Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:16:55.2552779Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:16:55.255005] 2025-12-04T16:17:22.4079934Z 2025-12-04T16:17:22.4081469Z cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility_1.1_38e9912ded2d6880_.log 2025-12-04T16:17:22.4101625Z Running 23 items in this shard: test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_get_any_data_ptr_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_get_template_any_data_ptr_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_make_tensor_clones_and_call_foreach_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_mv_tensor_accessor_cpu_works_with_2_9, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_mv_tensor_accessor_cuda_works_with_2_9, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my__foreach_mul__requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my__foreach_mul_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_empty_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_reshape_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_shape_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_string_op_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_view_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_cublas_handle_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_cuda_stream_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_constructor_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_equality_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_index_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_is_cpu_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_is_cuda_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_set_index_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_get_num_threads_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_parallel_for_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_tensor_device_requires_2_10 2025-12-04T16:17:22.4120314Z 2025-12-04T16:17:22.4120929Z Finished cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 ... [2025-12-04 16:17:22.407791][25400.017699517], took 0.45min 2025-12-04T16:17:22.4517704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.xml 2025-12-04T16:17:22.5423640Z Running distributions/test_distributions 1/1 ... [2025-12-04 16:17:22.542076][25400.151984717] 2025-12-04T16:17:22.5424273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T16:17:22.5427429Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributions/test_distributions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:17:22.542480] 2025-12-04T16:18:40.2117689Z 2025-12-04T16:18:40.2120765Z distributions/test_distributions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributions.test_distributions_1.1_10129d86baeaadf5_.log 2025-12-04T16:18:40.2228680Z Running 230 items in this shard: test/distributions/test_distributions.py::TestDistributions::test_argmax_relaxed_categorical, test/distributions/test_distributions.py::TestDistributions::test_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_beta_log_prob, test/distributions/test_distributions.py::TestDistributions::test_beta_sample, test/distributions/test_distributions.py::TestDistributions::test_beta_shape, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow_gpu, test/distributions/test_distributions.py::TestDistributions::test_binomial, test/distributions/test_distributions.py::TestDistributions::test_binomial_bfloat16, test/distributions/test_distributions.py::TestDistributions::test_binomial_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_binomial_extreme_vals, test/distributions/test_distributions.py::TestDistributions::test_binomial_half, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_binomial_sample, test/distributions/test_distributions.py::TestDistributions::test_binomial_stable, test/distributions/test_distributions.py::TestDistributions::test_binomial_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_cauchy, test/distributions/test_distributions.py::TestDistributions::test_cdf_icdf_inverse, test/distributions/test_distributions.py::TestDistributions::test_cdf_log_prob, test/distributions/test_distributions.py::TestDistributions::test_chi2_sample, test/distributions/test_distributions.py::TestDistributions::test_chi2_shape, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob_zero, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_mode, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_sample, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributions::test_distribution_expand, test/distributions/test_distributions.py::TestDistributions::test_distribution_subclass_expand, test/distributions/test_distributions.py::TestDistributions::test_enumerate_support_type, test/distributions/test_distributions.py::TestDistributions::test_exponential, test/distributions/test_distributions.py::TestDistributions::test_exponential_sample, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_shape, test/distributions/test_distributions.py::TestDistributions::test_gamma_log_prob_at_boundary, test/distributions/test_distributions.py::TestDistributions::test_gamma_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_shape, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_geometric, test/distributions/test_distributions.py::TestDistributions::test_geometric_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_geometric_sample, test/distributions/test_distributions.py::TestDistributions::test_gumbel, test/distributions/test_distributions.py::TestDistributions::test_gumbel_sample, test/distributions/test_distributions.py::TestDistributions::test_halfcauchy, test/distributions/test_distributions.py::TestDistributions::test_halfnormal, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_has_examples, test/distributions/test_distributions.py::TestDistributions::test_independent_expand, test/distributions/test_distributions.py::TestDistributions::test_independent_shape, test/distributions/test_distributions.py::TestDistributions::test_invalid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_inversegamma, test/distributions/test_distributions.py::TestDistributions::test_inversegamma_sample, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_mean_variance, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_shape, test/distributions/test_distributions.py::TestDistributions::test_laplace, test/distributions/test_distributions.py::TestDistributions::test_laplace_sample, test/distributions/test_distributions.py::TestDistributions::test_lazy_property_grad, test/distributions/test_distributions.py::TestDistributions::test_lkj_cholesky_log_prob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lognormal, test/distributions/test_distributions.py::TestDistributions::test_lognormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_lognormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_sample, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributions::test_mode, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_multinomial_2d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_sequential_draw, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_normal, test/distributions/test_distributions.py::TestDistributions::test_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_pareto, test/distributions/test_distributions.py::TestDistributions::test_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_forward_ad, test/distributions/test_distributions.py::TestDistributions::test_poisson_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_log_prob, test/distributions/test_distributions.py::TestDistributions::test_poisson_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_shape, test/distributions/test_distributions.py::TestDistributions::test_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_repr, test/distributions/test_distributions.py::TestDistributions::test_rounded_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_rsample_requires_grad, test/distributions/test_distributions.py::TestDistributions::test_sample_detached, test/distributions/test_distributions.py::TestDistributions::test_studentT, test/distributions/test_distributions.py::TestDistributions::test_studentT_log_prob, test/distributions/test_distributions.py::TestDistributions::test_studentT_sample, test/distributions/test_distributions.py::TestDistributions::test_support_attributes, test/distributions/test_distributions.py::TestDistributions::test_torch_binomial_dtype_errors, test/distributions/test_distributions.py::TestDistributions::test_uniform, test/distributions/test_distributions.py::TestDistributions::test_valid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_vonmises_logprob, test/distributions/test_distributions.py::TestDistributions::test_vonmises_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_log_prob, test/distributions/test_distributions.py::TestDistributions::test_wishart_moments, test/distributions/test_distributions.py::TestDistributions::test_wishart_properties, test/distributions/test_distributions.py::TestDistributions::test_wishart_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_shape, test/distributions/test_distributions.py::TestDistributions::test_wishart_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_zero_excluded_binomial, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_alpha, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_beta, test/distributions/test_distributions.py::TestRsample::test_chi2, test/distributions/test_distributions.py::TestRsample::test_dirichlet_multivariate, test/distributions/test_distributions.py::TestRsample::test_dirichlet_on_diagonal, test/distributions/test_distributions.py::TestRsample::test_dirichlet_tangent_field, test/distributions/test_distributions.py::TestRsample::test_gamma, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape_vectorized_n, test/distributions/test_distributions.py::TestDistributionShapes::test_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_entropy_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_scalar_param, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_tensor_param, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gumbel_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_kumaraswamy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_mean_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_multinomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_one_hot_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_pareto_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_weibull_scale_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_tensor_params, test/distributions/test_distributions.py::TestKL::test_entropy_exponential_family, test/distributions/test_distributions.py::TestKL::test_entropy_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_edgecases, test/distributions/test_distributions.py::TestKL::test_kl_exponential_family, test/distributions/test_distributions.py::TestKL::test_kl_infinite, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched_broadcasted, test/distributions/test_distributions.py::TestKL::test_kl_shape, test/distributions/test_distributions.py::TestKL::test_kl_transformed, test/distributions/test_distributions.py::TestConstraints::test_params_constraints, test/distributions/test_distributions.py::TestConstraints::test_support_constraints, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob_with_logits, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob_with_logits, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_logits_initialization, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_probs_initialization, test/distributions/test_distributions.py::TestAgainstScipy::test_cdf, test/distributions/test_distributions.py::TestAgainstScipy::test_icdf, test/distributions/test_distributions.py::TestAgainstScipy::test_mean, test/distributions/test_distributions.py::TestAgainstScipy::test_variance_stddev, test/distributions/test_distributions.py::TestFunctors::test_cat_event_dim, test/distributions/test_distributions.py::TestFunctors::test_cat_transform, test/distributions/test_distributions.py::TestFunctors::test_cat_transform_non_uniform, test/distributions/test_distributions.py::TestFunctors::test_stack_transform, test/distributions/test_distributions.py::TestValidation::test_invalid, test/distributions/test_distributions.py::TestValidation::test_invalid_log_probs_arg, test/distributions/test_distributions.py::TestValidation::test_valid, test/distributions/test_distributions.py::TestValidation::test_warning_unimplemented_constraints, test/distributions/test_distributions.py::TestJit::test_cdf, test/distributions/test_distributions.py::TestJit::test_entropy, test/distributions/test_distributions.py::TestJit::test_enumerate_support, test/distributions/test_distributions.py::TestJit::test_log_prob, test/distributions/test_distributions.py::TestJit::test_mean, test/distributions/test_distributions.py::TestJit::test_rsample, test/distributions/test_distributions.py::TestJit::test_sample, test/distributions/test_distributions.py::TestJit::test_variance 2025-12-04T16:18:40.2335399Z 2025-12-04T16:18:40.2335815Z Finished distributions/test_distributions 1/1 ... [2025-12-04 16:18:40.211950][25477.821855577], took 1.29min 2025-12-04T16:18:40.2563004Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.xml 2025-12-04T16:18:47.2843273Z Running test batch 'tests to run' cost 23636.91 seconds 2025-12-04T16:18:47.2859440Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.2863489Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9637688d12c11f0bad30242ac110002 2025-12-04T16:18:47.3909731Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9637688d12c11f0bad30242ac110002 2025-12-04T16:18:47.3925108Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.3928166Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e973b566d12c11f0bad30242ac110002 2025-12-04T16:18:47.4497742Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e973b566d12c11f0bad30242ac110002 2025-12-04T16:18:47.4513717Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.4516374Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e97cafccd12c11f0bad30242ac110002 2025-12-04T16:18:47.4885409Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e97cafccd12c11f0bad30242ac110002 2025-12-04T16:18:47.4900628Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.4903727Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e982982ed12c11f0bad30242ac110002 2025-12-04T16:18:47.5285845Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e982982ed12c11f0bad30242ac110002 2025-12-04T16:18:47.5301579Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.5303976Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e988b416d12c11f0bad30242ac110002 2025-12-04T16:18:47.5646659Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e988b416d12c11f0bad30242ac110002 2025-12-04T16:18:47.5661550Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.5663930Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e98e3292d12c11f0bad30242ac110002 2025-12-04T16:18:47.6403423Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e98e3292d12c11f0bad30242ac110002 2025-12-04T16:18:47.6418491Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.6420749Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e999bf36d12c11f0bad30242ac110002 2025-12-04T16:18:47.6962786Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e999bf36d12c11f0bad30242ac110002 2025-12-04T16:18:47.6978642Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.6980889Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a24ac0d12c11f0bad30242ac110002 2025-12-04T16:18:47.7304676Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a24ac0d12c11f0bad30242ac110002 2025-12-04T16:18:47.7320133Z Emitting td_test_failure_stats_v2 2025-12-04T16:18:47.7322236Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a7806cd12c11f0bad30242ac110002 2025-12-04T16:18:47.8066677Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a7806cd12c11f0bad30242ac110002 2025-12-04T16:18:47.8068174Z inductor/test_aot_inductor 4/6 failed! 2025-12-04T16:18:47.8068580Z inductor/test_kernel_benchmark 1/1 failed! 2025-12-04T16:18:47.8069087Z inductor/test_pattern_matcher 1/1 failed! 2025-12-04T16:18:47.8069469Z inductor/test_cuda_repro 1/1 failed! 2025-12-04T16:18:47.8069870Z inductor/test_cuda_select_algorithm 4/5 failed! 2025-12-04T16:18:47.8070290Z inductor/test_native_matmul 1/2 failed! 2025-12-04T16:18:47.8070647Z inductor/test_memory 1/1 failed! 2025-12-04T16:18:47.8071006Z inductor/test_unbacked_symints 1/1 failed! 2025-12-04T16:18:47.8071425Z inductor/test_mix_order_reduction 1/2 failed! 2025-12-04T16:18:48.5790468Z 2025-12-04T16:18:48.5791024Z real 394m5.349s 2025-12-04T16:18:48.5791554Z user 405m53.780s 2025-12-04T16:18:48.5791831Z sys 54m54.899s 2025-12-04T16:18:48.5792074Z + sccache_epilogue 2025-12-04T16:18:48.5792399Z + echo '::group::Sccache Compilation Log' 2025-12-04T16:18:48.5793107Z ##[group]Sccache Compilation Log 2025-12-04T16:18:48.5793546Z + echo '=================== sccache compilation log ===================' 2025-12-04T16:18:48.5794016Z =================== sccache compilation log =================== 2025-12-04T16:18:48.5794767Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T16:18:48.5941586Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T16:18:48.5942619Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T16:18:48.5943191Z + sccache --show-stats 2025-12-04T16:18:48.5976994Z Compile requests 5079 2025-12-04T16:18:48.5977405Z Compile requests executed 265 2025-12-04T16:18:48.5977754Z Cache hits 116 2025-12-04T16:18:48.5978109Z Cache hits (C/C++) 116 2025-12-04T16:18:48.5978457Z Cache misses 129 2025-12-04T16:18:48.5979217Z Cache misses (C/C++) 129 2025-12-04T16:18:48.5979791Z Cache hits rate 47.35 % 2025-12-04T16:18:48.5980404Z Cache hits rate (C/C++) 47.35 % 2025-12-04T16:18:48.5981156Z Cache timeouts 0 2025-12-04T16:18:48.5981802Z Cache read errors 0 2025-12-04T16:18:48.5982159Z Forced recaches 0 2025-12-04T16:18:48.5982521Z Cache write errors 0 2025-12-04T16:18:48.5982892Z Cache errors 0 2025-12-04T16:18:48.5983233Z Compilations 129 2025-12-04T16:18:48.5983605Z Compilation failures 20 2025-12-04T16:18:48.5983986Z Non-cacheable compilations 0 2025-12-04T16:18:48.5984347Z Non-cacheable calls 145 2025-12-04T16:18:48.5984719Z Non-compilation calls 4669 2025-12-04T16:18:48.5985092Z Unsupported compiler calls 0 2025-12-04T16:18:48.5985453Z Average cache write 0.048 s 2025-12-04T16:18:48.5985831Z Average compiler 8.181 s 2025-12-04T16:18:48.5986203Z Average cache read hit 0.049 s 2025-12-04T16:18:48.5986568Z Failed distributed compilations 0 2025-12-04T16:18:48.5986834Z 2025-12-04T16:18:48.5986949Z Non-cacheable reasons: 2025-12-04T16:18:48.5987250Z unknown source language 80 2025-12-04T16:18:48.5987607Z -E 65 2025-12-04T16:18:48.5987861Z 2025-12-04T16:18:48.5988129Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T16:18:48.5988663Z Version (client) 0.10.0 2025-12-04T16:18:48.5989175Z + sccache --stop-server 2025-12-04T16:18:48.6004194Z Stopping sccache server... 2025-12-04T16:18:48.6006595Z Compile requests 5079 2025-12-04T16:18:48.6006990Z Compile requests executed 265 2025-12-04T16:18:48.6007345Z Cache hits 116 2025-12-04T16:18:48.6007701Z Cache hits (C/C++) 116 2025-12-04T16:18:48.6008073Z Cache misses 129 2025-12-04T16:18:48.6008418Z Cache misses (C/C++) 129 2025-12-04T16:18:48.6008787Z Cache hits rate 47.35 % 2025-12-04T16:18:48.6009172Z Cache hits rate (C/C++) 47.35 % 2025-12-04T16:18:48.6009579Z Cache timeouts 0 2025-12-04T16:18:48.6010066Z Cache read errors 0 2025-12-04T16:18:48.6010655Z Forced recaches 0 2025-12-04T16:18:48.6011126Z Cache write errors 0 2025-12-04T16:18:48.6011471Z Cache errors 0 2025-12-04T16:18:48.6011830Z Compilations 129 2025-12-04T16:18:48.6012214Z Compilation failures 20 2025-12-04T16:18:48.6012582Z Non-cacheable compilations 0 2025-12-04T16:18:48.6012958Z Non-cacheable calls 145 2025-12-04T16:18:48.6013323Z Non-compilation calls 4669 2025-12-04T16:18:48.6013701Z Unsupported compiler calls 0 2025-12-04T16:18:48.6014065Z Average cache write 0.048 s 2025-12-04T16:18:48.6014444Z Average compiler 8.181 s 2025-12-04T16:18:48.6014817Z Average cache read hit 0.049 s 2025-12-04T16:18:48.6015190Z Failed distributed compilations 0 2025-12-04T16:18:48.6015460Z 2025-12-04T16:18:48.6015568Z Non-cacheable reasons: 2025-12-04T16:18:48.6015874Z unknown source language 80 2025-12-04T16:18:48.6016216Z -E 65 2025-12-04T16:18:48.6016461Z 2025-12-04T16:18:48.6016727Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T16:18:48.6017257Z Version (client) 0.10.0 2025-12-04T16:18:48.6017622Z + echo ::endgroup:: 2025-12-04T16:18:48.6018164Z ##[endgroup] 2025-12-04T16:18:48.6018403Z + cleanup_workspace 2025-12-04T16:18:48.6018987Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T16:18:48.6020078Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T16:18:48.6020806Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T16:18:48.6021412Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T16:18:48.6022102Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T16:18:48.6022801Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T16:18:48.6023342Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T16:18:49.3825717Z ##[error]Process completed with exit code 1. 2025-12-04T16:18:49.3914553Z Prepare all required actions 2025-12-04T16:18:49.3915019Z Getting action download info 2025-12-04T16:18:49.5981821Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T16:18:49.5982228Z with: 2025-12-04T16:18:49.5982485Z cache_dir: .pytest_cache 2025-12-04T16:18:49.5982786Z shard: 4 2025-12-04T16:18:49.5983082Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T16:18:49.5983492Z test_config: legacy_nvidia_driver 2025-12-04T16:18:49.5983937Z job_identifier: periodic_linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T16:18:49.5984380Z env: 2025-12-04T16:18:49.5984603Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:49.5984915Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:49.5985282Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:49.5985920Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:49.5986497Z ##[endgroup] 2025-12-04T16:18:49.6024647Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T16:18:49.6025063Z with: 2025-12-04T16:18:49.6025288Z shell: bash 2025-12-04T16:18:49.6025542Z timeout_minutes: 5 2025-12-04T16:18:49.6025819Z max_attempts: 5 2025-12-04T16:18:49.6026080Z retry_wait_seconds: 30 2025-12-04T16:18:49.6026477Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T16:18:49.6026924Z polling_interval_seconds: 1 2025-12-04T16:18:49.6027256Z warning_on_retry: true 2025-12-04T16:18:49.6027543Z continue_on_error: false 2025-12-04T16:18:49.6027839Z env: 2025-12-04T16:18:49.6028079Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:49.6028369Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:49.6028735Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:49.6029383Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:49.6029949Z ##[endgroup] 2025-12-04T16:18:49.9990819Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T16:18:51.2908527Z Collecting boto3==1.35.42 2025-12-04T16:18:51.3110175Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T16:18:51.4038269Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T16:18:51.4085492Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T16:18:52.7737998Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T16:18:52.7796126Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T16:18:52.9416840Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T16:18:52.9487178Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T16:18:52.9492160Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T16:18:53.1906517Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T16:18:53.2962599Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T16:18:53.9442925Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T16:18:54.6923608Z Command completed after 1 attempt(s). 2025-12-04T16:18:54.6979260Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T16:18:54.6979893Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T16:18:54.6980315Z  --upload \ 2025-12-04T16:18:54.6980679Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T16:18:54.6981192Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T16:18:54.6981635Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T16:18:54.6982027Z  --sha "$SHA" \ 2025-12-04T16:18:54.6982348Z  --test_config "$TEST_CONFIG" \ 2025-12-04T16:18:54.6982724Z  --shard "$SHARD" \ 2025-12-04T16:18:54.6983296Z  --repo "$REPO" \ 2025-12-04T16:18:54.6983625Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T16:18:54.6994434Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:54.6994897Z env: 2025-12-04T16:18:54.6995150Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:54.6995450Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:54.6995814Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:54.6996476Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:54.6997071Z CACHE_DIR: .pytest_cache 2025-12-04T16:18:54.6997473Z JOB_IDENTIFIER: periodic_linux-jammy-cuda12.4-py3.10-gcc11 2025-12-04T16:18:54.6997977Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T16:18:54.6998385Z TEST_CONFIG: legacy_nvidia_driver 2025-12-04T16:18:54.6998709Z SHARD: 4 2025-12-04T16:18:54.6998956Z REPO: pytorch/pytorch 2025-12-04T16:18:54.6999243Z ##[endgroup] 2025-12-04T16:18:55.2051285Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T16:18:55.2053735Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.4-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='legacy_nvidia_driver', shard='4', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T16:18:55.2056135Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T16:18:55.2057701Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4 2025-12-04T16:18:55.2060206Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4.zip 2025-12-04T16:18:55.2062528Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4.zip 2025-12-04T16:18:55.2627102Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T16:18:55.2627579Z cat test/**/*_toprint.log || true 2025-12-04T16:18:55.2634290Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:55.2634733Z env: 2025-12-04T16:18:55.2634985Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:55.2635293Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:55.2635657Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:55.2636315Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:55.2636902Z ##[endgroup] 2025-12-04T16:18:55.2739659Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T16:18:55.2769638Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T16:18:55.2770069Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T16:18:55.2776305Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:55.2776758Z env: 2025-12-04T16:18:55.2777008Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:55.2777324Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:55.2777783Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:55.2778443Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:55.2779046Z MONITOR_SCRIPT_PID: 68866 2025-12-04T16:18:55.2779411Z ##[endgroup] 2025-12-04T16:18:55.2805970Z /home/ec2-user/actions-runner/_work/_temp/580e4b3c-c61e-4546-b3f6-8e607cf3807e.sh: line 1: kill: (68866) - No such process 2025-12-04T16:18:55.2808095Z ##[error]Process completed with exit code 1. 2025-12-04T16:18:55.2949341Z Prepare all required actions 2025-12-04T16:18:55.2949835Z Getting action download info 2025-12-04T16:18:55.5256258Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T16:18:55.7809955Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T16:18:56.3065454Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T16:18:56.3065880Z with: 2025-12-04T16:18:56.3066360Z file-suffix: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T16:18:56.3066959Z s3-bucket: gha-artifacts 2025-12-04T16:18:56.3067256Z env: 2025-12-04T16:18:56.3067483Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.3067801Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.3068164Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.3068799Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.3069426Z ##[endgroup] 2025-12-04T16:18:56.3095932Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T16:18:56.3096459Z # Remove any previous test jsons if they exist 2025-12-04T16:18:56.3096903Z rm -f test-jsons-*.zip 2025-12-04T16:18:56.3097408Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T16:18:56.3104465Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:56.3104909Z env: 2025-12-04T16:18:56.3105159Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.3105473Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.3105826Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.3106479Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.3107292Z FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T16:18:56.3107863Z ##[endgroup] 2025-12-04T16:18:56.3329005Z adding: test/test-reports/td_exclusions-a20106558b25723d42f9.json (deflated 82%) 2025-12-04T16:18:56.3335390Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.json (deflated 93%) 2025-12-04T16:18:56.3337375Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.json (deflated 91%) 2025-12-04T16:18:56.3339532Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.json (deflated 91%) 2025-12-04T16:18:56.3344031Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.json (deflated 94%) 2025-12-04T16:18:56.3374267Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.json (deflated 93%) 2025-12-04T16:18:56.3399745Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.json (deflated 94%) 2025-12-04T16:18:56.3403579Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.json (deflated 89%) 2025-12-04T16:18:56.3405655Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.json (deflated 90%) 2025-12-04T16:18:56.3407818Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.json (deflated 90%) 2025-12-04T16:18:56.3409402Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.json (deflated 76%) 2025-12-04T16:18:56.3414935Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.json (deflated 96%) 2025-12-04T16:18:56.3419979Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.json (deflated 95%) 2025-12-04T16:18:56.3425713Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.json (deflated 95%) 2025-12-04T16:18:56.3430484Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.json (deflated 90%) 2025-12-04T16:18:56.3432395Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.json (deflated 90%) 2025-12-04T16:18:56.3434352Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.json (deflated 90%) 2025-12-04T16:18:56.3435823Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.json (deflated 90%) 2025-12-04T16:18:56.3437605Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.json (deflated 87%) 2025-12-04T16:18:56.3438879Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.json (deflated 81%) 2025-12-04T16:18:56.3440140Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.json (deflated 81%) 2025-12-04T16:18:56.3442964Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.json (deflated 92%) 2025-12-04T16:18:56.3457377Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.json (deflated 95%) 2025-12-04T16:18:56.3471784Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.json (deflated 95%) 2025-12-04T16:18:56.3476416Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.json (deflated 91%) 2025-12-04T16:18:56.3485885Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.json (deflated 97%) 2025-12-04T16:18:56.3495219Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.json (deflated 97%) 2025-12-04T16:18:56.3497787Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.json (deflated 91%) 2025-12-04T16:18:56.3510210Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.json (deflated 92%) 2025-12-04T16:18:56.3511585Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.json (deflated 57%) 2025-12-04T16:18:56.3517955Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.json (deflated 92%) 2025-12-04T16:18:56.3519399Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.json (deflated 85%) 2025-12-04T16:18:56.3520881Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.json (deflated 85%) 2025-12-04T16:18:56.3522433Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.json (deflated 85%) 2025-12-04T16:18:56.3524028Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.json (deflated 85%) 2025-12-04T16:18:56.3525525Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.json (deflated 85%) 2025-12-04T16:18:56.3527183Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.json (deflated 85%) 2025-12-04T16:18:56.3528664Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.json (deflated 86%) 2025-12-04T16:18:56.3530152Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.json (deflated 85%) 2025-12-04T16:18:56.3531643Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.json (deflated 85%) 2025-12-04T16:18:56.3533134Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.json (deflated 86%) 2025-12-04T16:18:56.3534620Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.json (deflated 85%) 2025-12-04T16:18:56.3536094Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.json (deflated 85%) 2025-12-04T16:18:56.3537581Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.json (deflated 85%) 2025-12-04T16:18:56.3539069Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.json (deflated 85%) 2025-12-04T16:18:56.3540554Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.json (deflated 85%) 2025-12-04T16:18:56.3542048Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.json (deflated 85%) 2025-12-04T16:18:56.3543521Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.json (deflated 85%) 2025-12-04T16:18:56.3545012Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.json (deflated 85%) 2025-12-04T16:18:56.3546493Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.json (deflated 86%) 2025-12-04T16:18:56.3547972Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.json (deflated 85%) 2025-12-04T16:18:56.3549450Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.json (deflated 85%) 2025-12-04T16:18:56.3550927Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.json (deflated 85%) 2025-12-04T16:18:56.3552414Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.json (deflated 85%) 2025-12-04T16:18:56.3553895Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.json (deflated 85%) 2025-12-04T16:18:56.3555376Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.json (deflated 85%) 2025-12-04T16:18:56.3556840Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.json (deflated 85%) 2025-12-04T16:18:56.3558361Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.json (deflated 85%) 2025-12-04T16:18:56.3559835Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.json (deflated 85%) 2025-12-04T16:18:56.3561411Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.json (deflated 85%) 2025-12-04T16:18:56.3562958Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.json (deflated 85%) 2025-12-04T16:18:56.3564426Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.json (deflated 85%) 2025-12-04T16:18:56.3565911Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.json (deflated 85%) 2025-12-04T16:18:56.3567401Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.json (deflated 85%) 2025-12-04T16:18:56.3568880Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.json (stored 0%) 2025-12-04T16:18:56.3570284Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.json (deflated 79%) 2025-12-04T16:18:56.3571644Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.json (deflated 65%) 2025-12-04T16:18:56.3573036Z adding: test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.json (deflated 59%) 2025-12-04T16:18:56.3574402Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.json (deflated 86%) 2025-12-04T16:18:56.3575706Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.json (deflated 82%) 2025-12-04T16:18:56.3577023Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.json (deflated 82%) 2025-12-04T16:18:56.3578340Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.json (deflated 82%) 2025-12-04T16:18:56.3579671Z adding: test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.json (deflated 89%) 2025-12-04T16:18:56.3580931Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.json (deflated 92%) 2025-12-04T16:18:56.3582106Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.json (deflated 93%) 2025-12-04T16:18:56.3583287Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.json (deflated 93%) 2025-12-04T16:18:56.3584464Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.json (deflated 85%) 2025-12-04T16:18:56.3585628Z adding: test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.json (deflated 88%) 2025-12-04T16:18:56.3586886Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.json (deflated 91%) 2025-12-04T16:18:56.3596303Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.json (deflated 96%) 2025-12-04T16:18:56.3609522Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.json (deflated 96%) 2025-12-04T16:18:56.3610941Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.json (deflated 87%) 2025-12-04T16:18:56.3624001Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.json (deflated 96%) 2025-12-04T16:18:56.3636878Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.json (deflated 96%) 2025-12-04T16:18:56.3638397Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.json (deflated 89%) 2025-12-04T16:18:56.3639854Z adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.json (deflated 84%) 2025-12-04T16:18:56.3642264Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.json (deflated 94%) 2025-12-04T16:18:56.3643701Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.json (deflated 83%) 2025-12-04T16:18:56.3645133Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.json (deflated 83%) 2025-12-04T16:18:56.3646570Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.json (deflated 83%) 2025-12-04T16:18:56.3648004Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.json (deflated 83%) 2025-12-04T16:18:56.3649434Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.json (deflated 83%) 2025-12-04T16:18:56.3650872Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.json (deflated 86%) 2025-12-04T16:18:56.3652299Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.json (deflated 86%) 2025-12-04T16:18:56.3653733Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.json (deflated 86%) 2025-12-04T16:18:56.3655168Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.json (deflated 87%) 2025-12-04T16:18:56.3656596Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.json (deflated 83%) 2025-12-04T16:18:56.3658020Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.json (deflated 83%) 2025-12-04T16:18:56.3659457Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.json (deflated 83%) 2025-12-04T16:18:56.3660899Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.json (deflated 83%) 2025-12-04T16:18:56.3662322Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.json (deflated 83%) 2025-12-04T16:18:56.3663841Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.json (deflated 83%) 2025-12-04T16:18:56.3665264Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.json (deflated 83%) 2025-12-04T16:18:56.3666692Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.json (deflated 83%) 2025-12-04T16:18:56.3668118Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.json (deflated 83%) 2025-12-04T16:18:56.3669621Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.json (deflated 83%) 2025-12-04T16:18:56.3671034Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.json (deflated 83%) 2025-12-04T16:18:56.3672592Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.json (deflated 87%) 2025-12-04T16:18:56.3674025Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.json (deflated 86%) 2025-12-04T16:18:56.3675461Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.json (deflated 86%) 2025-12-04T16:18:56.3676899Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.json (deflated 86%) 2025-12-04T16:18:56.3678322Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.json (deflated 83%) 2025-12-04T16:18:56.3679755Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.json (deflated 83%) 2025-12-04T16:18:56.3681181Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.json (deflated 85%) 2025-12-04T16:18:56.3682689Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.json (deflated 83%) 2025-12-04T16:18:56.3684108Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.json (deflated 83%) 2025-12-04T16:18:56.3685533Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.json (deflated 86%) 2025-12-04T16:18:56.3686964Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.json (deflated 83%) 2025-12-04T16:18:56.3688389Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.json (deflated 83%) 2025-12-04T16:18:56.3689823Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.json (deflated 83%) 2025-12-04T16:18:56.3691249Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.json (deflated 83%) 2025-12-04T16:18:56.3692693Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.json (deflated 83%) 2025-12-04T16:18:56.3694120Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.json (deflated 86%) 2025-12-04T16:18:56.3695543Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.json (deflated 86%) 2025-12-04T16:18:56.3696962Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.json (deflated 86%) 2025-12-04T16:18:56.3698404Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.json (deflated 83%) 2025-12-04T16:18:56.3699858Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.json (deflated 83%) 2025-12-04T16:18:56.3701610Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.json (deflated 83%) 2025-12-04T16:18:56.3703108Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.json (deflated 83%) 2025-12-04T16:18:56.3704557Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.json (deflated 83%) 2025-12-04T16:18:56.3706122Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.json (deflated 83%) 2025-12-04T16:18:56.3707541Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.json (deflated 83%) 2025-12-04T16:18:56.3708976Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.json (deflated 83%) 2025-12-04T16:18:56.3710408Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.json (deflated 83%) 2025-12-04T16:18:56.3711842Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.json (deflated 83%) 2025-12-04T16:18:56.3713275Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.json (deflated 83%) 2025-12-04T16:18:56.3714696Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.json (deflated 83%) 2025-12-04T16:18:56.3716133Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.json (deflated 83%) 2025-12-04T16:18:56.3717568Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.json (deflated 83%) 2025-12-04T16:18:56.3719004Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.json (deflated 83%) 2025-12-04T16:18:56.3720425Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.json (deflated 83%) 2025-12-04T16:18:56.3721868Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.json (deflated 83%) 2025-12-04T16:18:56.3723384Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.json (deflated 83%) 2025-12-04T16:18:56.3724820Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.json (deflated 84%) 2025-12-04T16:18:56.3726257Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.json (deflated 83%) 2025-12-04T16:18:56.3727699Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.json (deflated 83%) 2025-12-04T16:18:56.3729132Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.json (deflated 83%) 2025-12-04T16:18:56.3730568Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.json (deflated 83%) 2025-12-04T16:18:56.3732016Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.json (deflated 83%) 2025-12-04T16:18:56.3733443Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.json (deflated 86%) 2025-12-04T16:18:56.3734871Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.json (deflated 86%) 2025-12-04T16:18:56.3736301Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.json (deflated 86%) 2025-12-04T16:18:56.3737774Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.json (deflated 85%) 2025-12-04T16:18:56.3739239Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.json (deflated 83%) 2025-12-04T16:18:56.3740716Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.json (deflated 83%) 2025-12-04T16:18:56.3742152Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.json (deflated 83%) 2025-12-04T16:18:56.3743592Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.json (deflated 83%) 2025-12-04T16:18:56.3745025Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.json (deflated 83%) 2025-12-04T16:18:56.3746445Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.json (deflated 84%) 2025-12-04T16:18:56.3747877Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.json (deflated 83%) 2025-12-04T16:18:56.3749313Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.json (deflated 83%) 2025-12-04T16:18:56.3750754Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.json (deflated 83%) 2025-12-04T16:18:56.3752175Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.json (deflated 83%) 2025-12-04T16:18:56.3753610Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.json (deflated 83%) 2025-12-04T16:18:56.3755038Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.json (deflated 90%) 2025-12-04T16:18:56.3756483Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.json (deflated 82%) 2025-12-04T16:18:56.3757913Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.json (deflated 82%) 2025-12-04T16:18:56.3759326Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.json (deflated 92%) 2025-12-04T16:18:56.3760749Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.json (deflated 83%) 2025-12-04T16:18:56.3762237Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.json (deflated 83%) 2025-12-04T16:18:56.3763674Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.json (deflated 91%) 2025-12-04T16:18:56.3765141Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.json (deflated 83%) 2025-12-04T16:18:56.3766555Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.json (deflated 83%) 2025-12-04T16:18:56.3767979Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.json (deflated 86%) 2025-12-04T16:18:56.3769408Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.json (deflated 83%) 2025-12-04T16:18:56.3770885Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.json (deflated 83%) 2025-12-04T16:18:56.3772340Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.json (deflated 86%) 2025-12-04T16:18:56.3773849Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.json (deflated 86%) 2025-12-04T16:18:56.3775283Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.json (deflated 86%) 2025-12-04T16:18:56.3776703Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.json (deflated 85%) 2025-12-04T16:18:56.3778135Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.json (deflated 83%) 2025-12-04T16:18:56.3779553Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.json (deflated 83%) 2025-12-04T16:18:56.3780985Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.json (deflated 84%) 2025-12-04T16:18:56.3782429Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.json (deflated 83%) 2025-12-04T16:18:56.3783867Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.json (deflated 83%) 2025-12-04T16:18:56.3785289Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.json (deflated 85%) 2025-12-04T16:18:56.3786719Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.json (deflated 83%) 2025-12-04T16:18:56.3788149Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.json (deflated 83%) 2025-12-04T16:18:56.3789589Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.json (deflated 83%) 2025-12-04T16:18:56.3791006Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.json (deflated 83%) 2025-12-04T16:18:56.3792434Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.json (deflated 83%) 2025-12-04T16:18:56.3793867Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.json (deflated 86%) 2025-12-04T16:18:56.3795295Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.json (deflated 83%) 2025-12-04T16:18:56.3796724Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.json (deflated 83%) 2025-12-04T16:18:56.3798145Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.json (deflated 83%) 2025-12-04T16:18:56.3799572Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.json (deflated 83%) 2025-12-04T16:18:56.3801166Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.json (deflated 83%) 2025-12-04T16:18:56.3802673Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.json (deflated 87%) 2025-12-04T16:18:56.3804536Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.json (deflated 86%) 2025-12-04T16:18:56.3805974Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.json (deflated 86%) 2025-12-04T16:18:56.3807564Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.json (deflated 86%) 2025-12-04T16:18:56.3809004Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.json (deflated 86%) 2025-12-04T16:18:56.3810423Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.json (deflated 86%) 2025-12-04T16:18:56.3811847Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.json (deflated 83%) 2025-12-04T16:18:56.3813280Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.json (deflated 83%) 2025-12-04T16:18:56.3814718Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.json (deflated 83%) 2025-12-04T16:18:56.3816156Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.json (deflated 85%) 2025-12-04T16:18:56.3817594Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.json (deflated 83%) 2025-12-04T16:18:56.3819029Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.json (deflated 83%) 2025-12-04T16:18:56.3820470Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.json (deflated 83%) 2025-12-04T16:18:56.3821909Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.json (deflated 83%) 2025-12-04T16:18:56.3823333Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.json (deflated 83%) 2025-12-04T16:18:56.3824776Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.json (deflated 83%) 2025-12-04T16:18:56.3826215Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.json (deflated 83%) 2025-12-04T16:18:56.3827646Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.json (deflated 83%) 2025-12-04T16:18:56.3829085Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.json (deflated 82%) 2025-12-04T16:18:56.4104460Z adding: test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.json (deflated 99%) 2025-12-04T16:18:56.4126210Z adding: test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.json (deflated 93%) 2025-12-04T16:18:56.4156350Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.json (deflated 97%) 2025-12-04T16:18:56.4168518Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.json (deflated 95%) 2025-12-04T16:18:56.4181791Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.json (deflated 95%) 2025-12-04T16:18:56.4193802Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.json (deflated 95%) 2025-12-04T16:18:56.4206912Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.json (deflated 95%) 2025-12-04T16:18:56.4382927Z adding: test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.json (deflated 97%) 2025-12-04T16:18:56.4399188Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.json (deflated 97%) 2025-12-04T16:18:56.4436448Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.json (deflated 98%) 2025-12-04T16:18:56.4519467Z adding: test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.json (deflated 96%) 2025-12-04T16:18:56.4606445Z adding: test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.json (deflated 97%) 2025-12-04T16:18:56.4649568Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.json (deflated 95%) 2025-12-04T16:18:56.4690894Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.json (deflated 95%) 2025-12-04T16:18:56.4705835Z adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.json (deflated 97%) 2025-12-04T16:18:56.4714116Z adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.json (deflated 95%) 2025-12-04T16:18:56.4715612Z adding: test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.json (stored 0%) 2025-12-04T16:18:56.4720689Z adding: test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.json (deflated 95%) 2025-12-04T16:18:56.4721931Z adding: test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.json (deflated 95%) 2025-12-04T16:18:56.4723187Z adding: test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.json (deflated 94%) 2025-12-04T16:18:56.4724447Z adding: test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.json (deflated 62%) 2025-12-04T16:18:56.4725883Z adding: test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.json (stored 0%) 2025-12-04T16:18:56.4727311Z adding: test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.json (deflated 95%) 2025-12-04T16:18:56.4728653Z adding: test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.json (deflated 80%) 2025-12-04T16:18:56.4730040Z adding: test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.json (deflated 63%) 2025-12-04T16:18:56.4731441Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.json (deflated 96%) 2025-12-04T16:18:56.4732858Z adding: test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.json (deflated 74%) 2025-12-04T16:18:56.4734238Z adding: test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.json (deflated 57%) 2025-12-04T16:18:56.4735755Z adding: test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.json (deflated 83%) 2025-12-04T16:18:56.4737251Z adding: test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.json (deflated 89%) 2025-12-04T16:18:56.4738604Z adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.json (deflated 76%) 2025-12-04T16:18:56.4739866Z adding: test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.json (deflated 75%) 2025-12-04T16:18:56.4741057Z adding: test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.json (deflated 87%) 2025-12-04T16:18:56.4742518Z adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.json (deflated 72%) 2025-12-04T16:18:56.4744010Z adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.json (deflated 92%) 2025-12-04T16:18:56.4745349Z adding: test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.json (deflated 77%) 2025-12-04T16:18:56.4746639Z adding: test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.json (deflated 76%) 2025-12-04T16:18:56.4772809Z adding: test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.json (deflated 97%) 2025-12-04T16:18:56.4773997Z adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.json (deflated 33%) 2025-12-04T16:18:56.4812422Z adding: test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.json (deflated 96%) 2025-12-04T16:18:56.4813871Z adding: test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.json (deflated 65%) 2025-12-04T16:18:56.4815382Z adding: test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.json (deflated 71%) 2025-12-04T16:18:56.4816718Z adding: test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.json (deflated 88%) 2025-12-04T16:18:56.4826606Z adding: test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.json (deflated 94%) 2025-12-04T16:18:56.4827984Z adding: test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.json (deflated 85%) 2025-12-04T16:18:56.4829412Z adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.json (deflated 74%) 2025-12-04T16:18:56.4830825Z adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.json (deflated 74%) 2025-12-04T16:18:56.4832306Z adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.json (deflated 87%) 2025-12-04T16:18:56.4833681Z adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.json (deflated 70%) 2025-12-04T16:18:56.4835039Z adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.json (deflated 62%) 2025-12-04T16:18:56.4836343Z adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.json (deflated 66%) 2025-12-04T16:18:56.4853473Z adding: test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.json (deflated 98%) 2025-12-04T16:18:56.4866691Z adding: test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.json (deflated 98%) 2025-12-04T16:18:56.4867856Z adding: test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.json (deflated 78%) 2025-12-04T16:18:56.4869095Z adding: test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.json (deflated 87%) 2025-12-04T16:18:56.4870332Z adding: test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.json (deflated 78%) 2025-12-04T16:18:56.4871461Z adding: test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.json (deflated 79%) 2025-12-04T16:18:56.4872781Z adding: test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.json (deflated 39%) 2025-12-04T16:18:56.4874716Z adding: test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.json (deflated 89%) 2025-12-04T16:18:56.4875905Z adding: test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.json (deflated 84%) 2025-12-04T16:18:56.4877242Z adding: test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.json (deflated 84%) 2025-12-04T16:18:56.4878598Z adding: test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.json (deflated 83%) 2025-12-04T16:18:56.4991541Z adding: test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.json (deflated 88%) 2025-12-04T16:18:56.4992711Z adding: test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.json (deflated 86%) 2025-12-04T16:18:56.5014760Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.json (deflated 97%) 2025-12-04T16:18:56.5016424Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.json (deflated 87%) 2025-12-04T16:18:56.5017691Z adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.json (deflated 92%) 2025-12-04T16:18:56.5018928Z adding: test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.json (deflated 78%) 2025-12-04T16:18:56.5022126Z adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.json (deflated 90%) 2025-12-04T16:18:56.5028531Z adding: test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.json (deflated 92%) 2025-12-04T16:18:56.5029809Z adding: test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.json (deflated 69%) 2025-12-04T16:18:56.5032753Z adding: test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.json (deflated 95%) 2025-12-04T16:18:56.5034047Z adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.json (deflated 84%) 2025-12-04T16:18:56.5035268Z adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.json (deflated 94%) 2025-12-04T16:18:56.5036619Z adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.json (deflated 94%) 2025-12-04T16:18:56.5037919Z adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.json (deflated 72%) 2025-12-04T16:18:56.5042369Z adding: test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.json (deflated 93%) 2025-12-04T16:18:56.5044308Z adding: test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.json (deflated 90%) 2025-12-04T16:18:56.5045553Z adding: test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.json (stored 0%) 2025-12-04T16:18:56.5046821Z adding: test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.json (deflated 59%) 2025-12-04T16:18:56.5048210Z adding: test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.json (deflated 81%) 2025-12-04T16:18:56.5049522Z adding: test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.json (deflated 78%) 2025-12-04T16:18:56.5050727Z adding: test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.json (deflated 82%) 2025-12-04T16:18:56.5051858Z adding: test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.json (deflated 88%) 2025-12-04T16:18:56.5053417Z adding: test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.json (deflated 93%) 2025-12-04T16:18:56.5054616Z adding: test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.json (deflated 85%) 2025-12-04T16:18:56.5056324Z adding: test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.json (deflated 95%) 2025-12-04T16:18:56.5057536Z adding: test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.json (deflated 86%) 2025-12-04T16:18:56.5058639Z adding: test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.json (deflated 88%) 2025-12-04T16:18:56.5066470Z adding: test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.json (deflated 98%) 2025-12-04T16:18:56.5067717Z adding: test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.json (deflated 80%) 2025-12-04T16:18:56.5069024Z adding: test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.json (deflated 93%) 2025-12-04T16:18:56.5101002Z adding: test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.json (deflated 97%) 2025-12-04T16:18:56.5101981Z adding: test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.json (deflated 90%) 2025-12-04T16:18:56.5103029Z adding: test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.json (deflated 32%) 2025-12-04T16:18:56.5106032Z adding: test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.json (deflated 92%) 2025-12-04T16:18:56.5107373Z adding: test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.json (deflated 94%) 2025-12-04T16:18:56.5108483Z adding: test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.json (deflated 94%) 2025-12-04T16:18:56.5109520Z adding: test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.json (deflated 83%) 2025-12-04T16:18:56.5110597Z adding: test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.json (deflated 93%) 2025-12-04T16:18:56.5111740Z adding: test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.json (deflated 84%) 2025-12-04T16:18:56.5117725Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.json (deflated 96%) 2025-12-04T16:18:56.5141073Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.json (deflated 97%) 2025-12-04T16:18:56.5142333Z adding: test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.json (deflated 33%) 2025-12-04T16:18:56.5155269Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.json (deflated 97%) 2025-12-04T16:18:56.5159079Z adding: test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.json (deflated 97%) 2025-12-04T16:18:56.5161699Z adding: test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.json (deflated 93%) 2025-12-04T16:18:56.5181261Z adding: test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.json (deflated 97%) 2025-12-04T16:18:56.5185293Z adding: test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.json (deflated 95%) 2025-12-04T16:18:56.5187602Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.json (deflated 97%) 2025-12-04T16:18:56.5189646Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.json (deflated 95%) 2025-12-04T16:18:56.5191122Z adding: test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.json (deflated 89%) 2025-12-04T16:18:56.5192414Z adding: test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.json (deflated 73%) 2025-12-04T16:18:56.5197157Z adding: test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.json (deflated 96%) 2025-12-04T16:18:56.5203113Z adding: test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.json (deflated 94%) 2025-12-04T16:18:56.5234202Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T16:18:56.5234770Z # Remove any previous test reports if they exist 2025-12-04T16:18:56.5235226Z rm -f test-reports-*.zip 2025-12-04T16:18:56.5235786Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T16:18:56.5242777Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:56.5243220Z env: 2025-12-04T16:18:56.5243480Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.5243780Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.5244151Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.5244816Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.5245617Z FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T16:18:56.5246196Z ##[endgroup] 2025-12-04T16:18:56.5386452Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml (deflated 92%) 2025-12-04T16:18:56.5388354Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml (deflated 90%) 2025-12-04T16:18:56.5390480Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml (deflated 90%) 2025-12-04T16:18:56.5394539Z adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml (deflated 92%) 2025-12-04T16:18:56.5422532Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.xml (deflated 92%) 2025-12-04T16:18:56.5452501Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.xml (deflated 93%) 2025-12-04T16:18:56.5455901Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml (deflated 88%) 2025-12-04T16:18:56.5458113Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml (deflated 89%) 2025-12-04T16:18:56.5460340Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml (deflated 89%) 2025-12-04T16:18:56.5461732Z adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml (deflated 73%) 2025-12-04T16:18:56.5465919Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.xml (deflated 93%) 2025-12-04T16:18:56.5469828Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.xml (deflated 92%) 2025-12-04T16:18:56.5474228Z adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.xml (deflated 92%) 2025-12-04T16:18:56.5478650Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml (deflated 88%) 2025-12-04T16:18:56.5480640Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml (deflated 89%) 2025-12-04T16:18:56.5482654Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml (deflated 89%) 2025-12-04T16:18:56.5484507Z adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml (deflated 87%) 2025-12-04T16:18:56.5485863Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml (deflated 84%) 2025-12-04T16:18:56.5487110Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml (deflated 80%) 2025-12-04T16:18:56.5488355Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml (deflated 80%) 2025-12-04T16:18:56.5490960Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml (deflated 91%) 2025-12-04T16:18:56.5506144Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml (deflated 95%) 2025-12-04T16:18:56.5520779Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml (deflated 95%) 2025-12-04T16:18:56.5525455Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml (deflated 89%) 2025-12-04T16:18:56.5547542Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml (deflated 95%) 2025-12-04T16:18:56.5548796Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml (deflated 95%) 2025-12-04T16:18:56.5550664Z adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml (deflated 89%) 2025-12-04T16:18:56.5562126Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml (deflated 91%) 2025-12-04T16:18:56.5563503Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml (deflated 55%) 2025-12-04T16:18:56.5569074Z adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml (deflated 90%) 2025-12-04T16:18:56.5570494Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml (deflated 85%) 2025-12-04T16:18:56.5571966Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml (deflated 84%) 2025-12-04T16:18:56.5573420Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml (deflated 84%) 2025-12-04T16:18:56.5574880Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml (deflated 85%) 2025-12-04T16:18:56.5576346Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml (deflated 84%) 2025-12-04T16:18:56.5577804Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml (deflated 84%) 2025-12-04T16:18:56.5579264Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml (deflated 85%) 2025-12-04T16:18:56.5580711Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml (deflated 84%) 2025-12-04T16:18:56.5582176Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml (deflated 84%) 2025-12-04T16:18:56.5583734Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml (deflated 85%) 2025-12-04T16:18:56.5585247Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml (deflated 84%) 2025-12-04T16:18:56.5586786Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml (deflated 84%) 2025-12-04T16:18:56.5588251Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml (deflated 85%) 2025-12-04T16:18:56.5589713Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml (deflated 84%) 2025-12-04T16:18:56.5591178Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml (deflated 84%) 2025-12-04T16:18:56.5592640Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml (deflated 85%) 2025-12-04T16:18:56.5594094Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml (deflated 84%) 2025-12-04T16:18:56.5595560Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml (deflated 84%) 2025-12-04T16:18:56.5597017Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml (deflated 85%) 2025-12-04T16:18:56.5598472Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml (deflated 84%) 2025-12-04T16:18:56.5599938Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml (deflated 84%) 2025-12-04T16:18:56.5601603Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml (deflated 85%) 2025-12-04T16:18:56.5603174Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml (deflated 84%) 2025-12-04T16:18:56.5604637Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml (deflated 84%) 2025-12-04T16:18:56.5606093Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml (deflated 85%) 2025-12-04T16:18:56.5607538Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml (deflated 84%) 2025-12-04T16:18:56.5609007Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml (deflated 84%) 2025-12-04T16:18:56.5610473Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml (deflated 85%) 2025-12-04T16:18:56.5611933Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml (deflated 84%) 2025-12-04T16:18:56.5613387Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml (deflated 84%) 2025-12-04T16:18:56.5614820Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml (deflated 85%) 2025-12-04T16:18:56.5616343Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml (deflated 84%) 2025-12-04T16:18:56.5617802Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml (deflated 84%) 2025-12-04T16:18:56.5619290Z adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml (deflated 28%) 2025-12-04T16:18:56.5620834Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.xml (deflated 75%) 2025-12-04T16:18:56.5622174Z adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.xml (deflated 61%) 2025-12-04T16:18:56.5623544Z adding: test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.xml (deflated 58%) 2025-12-04T16:18:56.5624891Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml (deflated 85%) 2025-12-04T16:18:56.5626171Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml (deflated 81%) 2025-12-04T16:18:56.5627471Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml (deflated 81%) 2025-12-04T16:18:56.5628774Z adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml (deflated 79%) 2025-12-04T16:18:56.5630077Z adding: test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.xml (deflated 87%) 2025-12-04T16:18:56.5631315Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml (deflated 92%) 2025-12-04T16:18:56.5632469Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml (deflated 92%) 2025-12-04T16:18:56.5633728Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml (deflated 92%) 2025-12-04T16:18:56.5634892Z adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml (deflated 83%) 2025-12-04T16:18:56.5636044Z adding: test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.xml (deflated 84%) 2025-12-04T16:18:56.5637284Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml (deflated 89%) 2025-12-04T16:18:56.5649706Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml (deflated 95%) 2025-12-04T16:18:56.5665152Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml (deflated 95%) 2025-12-04T16:18:56.5666540Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml (deflated 87%) 2025-12-04T16:18:56.5682221Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml (deflated 95%) 2025-12-04T16:18:56.5697792Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml (deflated 95%) 2025-12-04T16:18:56.5699304Z adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml (deflated 86%) 2025-12-04T16:18:56.5700748Z adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.xml (deflated 81%) 2025-12-04T16:18:56.5702929Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml (deflated 93%) 2025-12-04T16:18:56.5704423Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml (deflated 82%) 2025-12-04T16:18:56.5705844Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml (deflated 82%) 2025-12-04T16:18:56.5707400Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml (deflated 82%) 2025-12-04T16:18:56.5708812Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml (deflated 82%) 2025-12-04T16:18:56.5710219Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml (deflated 82%) 2025-12-04T16:18:56.5711634Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml (deflated 85%) 2025-12-04T16:18:56.5713041Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml (deflated 85%) 2025-12-04T16:18:56.5714451Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml (deflated 85%) 2025-12-04T16:18:56.5715866Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml (deflated 85%) 2025-12-04T16:18:56.5717274Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml (deflated 82%) 2025-12-04T16:18:56.5718683Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml (deflated 82%) 2025-12-04T16:18:56.5720074Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml (deflated 82%) 2025-12-04T16:18:56.5721500Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml (deflated 82%) 2025-12-04T16:18:56.5723009Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml (deflated 82%) 2025-12-04T16:18:56.5724421Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml (deflated 82%) 2025-12-04T16:18:56.5725822Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml (deflated 82%) 2025-12-04T16:18:56.5727229Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml (deflated 82%) 2025-12-04T16:18:56.5728638Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml (deflated 82%) 2025-12-04T16:18:56.5730053Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml (deflated 82%) 2025-12-04T16:18:56.5731454Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml (deflated 82%) 2025-12-04T16:18:56.5732859Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml (deflated 86%) 2025-12-04T16:18:56.5734257Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml (deflated 85%) 2025-12-04T16:18:56.5735650Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml (deflated 85%) 2025-12-04T16:18:56.5737052Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml (deflated 84%) 2025-12-04T16:18:56.5738518Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml (deflated 82%) 2025-12-04T16:18:56.5739955Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml (deflated 82%) 2025-12-04T16:18:56.5741420Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml (deflated 84%) 2025-12-04T16:18:56.5742811Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml (deflated 82%) 2025-12-04T16:18:56.5744216Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml (deflated 82%) 2025-12-04T16:18:56.5745615Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml (deflated 84%) 2025-12-04T16:18:56.5747012Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml (deflated 82%) 2025-12-04T16:18:56.5748403Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml (deflated 82%) 2025-12-04T16:18:56.5749810Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml (deflated 82%) 2025-12-04T16:18:56.5751218Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml (deflated 82%) 2025-12-04T16:18:56.5752633Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml (deflated 82%) 2025-12-04T16:18:56.5754034Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml (deflated 86%) 2025-12-04T16:18:56.5755424Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml (deflated 85%) 2025-12-04T16:18:56.5756845Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml (deflated 85%) 2025-12-04T16:18:56.5758260Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml (deflated 82%) 2025-12-04T16:18:56.5759671Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml (deflated 82%) 2025-12-04T16:18:56.5761074Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml (deflated 82%) 2025-12-04T16:18:56.5762546Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml (deflated 82%) 2025-12-04T16:18:56.5763951Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml (deflated 82%) 2025-12-04T16:18:56.5765347Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml (deflated 82%) 2025-12-04T16:18:56.5766741Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml (deflated 82%) 2025-12-04T16:18:56.5768135Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml (deflated 82%) 2025-12-04T16:18:56.5769537Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml (deflated 82%) 2025-12-04T16:18:56.5770969Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml (deflated 82%) 2025-12-04T16:18:56.5772360Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml (deflated 82%) 2025-12-04T16:18:56.5773840Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml (deflated 82%) 2025-12-04T16:18:56.5775235Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml (deflated 82%) 2025-12-04T16:18:56.5776627Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml (deflated 82%) 2025-12-04T16:18:56.5778245Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml (deflated 82%) 2025-12-04T16:18:56.5779832Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml (deflated 82%) 2025-12-04T16:18:56.5781365Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml (deflated 82%) 2025-12-04T16:18:56.5782928Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml (deflated 82%) 2025-12-04T16:18:56.5784429Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml (deflated 83%) 2025-12-04T16:18:56.5785966Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml (deflated 82%) 2025-12-04T16:18:56.5787542Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml (deflated 82%) 2025-12-04T16:18:56.5789079Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml (deflated 82%) 2025-12-04T16:18:56.5790565Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml (deflated 82%) 2025-12-04T16:18:56.5792181Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml (deflated 82%) 2025-12-04T16:18:56.5793714Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml (deflated 85%) 2025-12-04T16:18:56.5795266Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml (deflated 85%) 2025-12-04T16:18:56.5796833Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml (deflated 85%) 2025-12-04T16:18:56.5798314Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml (deflated 84%) 2025-12-04T16:18:56.5799860Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml (deflated 82%) 2025-12-04T16:18:56.5801574Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml (deflated 82%) 2025-12-04T16:18:56.5803181Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml (deflated 82%) 2025-12-04T16:18:56.5804763Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml (deflated 82%) 2025-12-04T16:18:56.5806320Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml (deflated 82%) 2025-12-04T16:18:56.5807849Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml (deflated 83%) 2025-12-04T16:18:56.5809472Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml (deflated 82%) 2025-12-04T16:18:56.5811094Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml (deflated 82%) 2025-12-04T16:18:56.5812575Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml (deflated 82%) 2025-12-04T16:18:56.5814161Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml (deflated 82%) 2025-12-04T16:18:56.5815702Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml (deflated 82%) 2025-12-04T16:18:56.5817272Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml (deflated 88%) 2025-12-04T16:18:56.5818834Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml (deflated 81%) 2025-12-04T16:18:56.5820326Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml (deflated 81%) 2025-12-04T16:18:56.5821865Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml (deflated 91%) 2025-12-04T16:18:56.5823411Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml (deflated 82%) 2025-12-04T16:18:56.5824955Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml (deflated 82%) 2025-12-04T16:18:56.5826523Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml (deflated 90%) 2025-12-04T16:18:56.5828015Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml (deflated 82%) 2025-12-04T16:18:56.5829555Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml (deflated 82%) 2025-12-04T16:18:56.5831106Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml (deflated 85%) 2025-12-04T16:18:56.5832642Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml (deflated 82%) 2025-12-04T16:18:56.5834202Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml (deflated 82%) 2025-12-04T16:18:56.5835706Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml (deflated 85%) 2025-12-04T16:18:56.5837254Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml (deflated 85%) 2025-12-04T16:18:56.5838804Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml (deflated 85%) 2025-12-04T16:18:56.5840349Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml (deflated 84%) 2025-12-04T16:18:56.5841836Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml (deflated 82%) 2025-12-04T16:18:56.5843509Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml (deflated 82%) 2025-12-04T16:18:56.5845127Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml (deflated 83%) 2025-12-04T16:18:56.5846727Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml (deflated 82%) 2025-12-04T16:18:56.5848308Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml (deflated 82%) 2025-12-04T16:18:56.5849795Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml (deflated 84%) 2025-12-04T16:18:56.5851325Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml (deflated 82%) 2025-12-04T16:18:56.5852899Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml (deflated 82%) 2025-12-04T16:18:56.5854432Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml (deflated 82%) 2025-12-04T16:18:56.5855906Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml (deflated 82%) 2025-12-04T16:18:56.5857486Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml (deflated 82%) 2025-12-04T16:18:56.5859015Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml (deflated 85%) 2025-12-04T16:18:56.5860521Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml (deflated 82%) 2025-12-04T16:18:56.5862117Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml (deflated 82%) 2025-12-04T16:18:56.5863597Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml (deflated 82%) 2025-12-04T16:18:56.5865101Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml (deflated 82%) 2025-12-04T16:18:56.5866503Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml (deflated 82%) 2025-12-04T16:18:56.5867925Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml (deflated 86%) 2025-12-04T16:18:56.5869363Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml (deflated 85%) 2025-12-04T16:18:56.5870791Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml (deflated 85%) 2025-12-04T16:18:56.5872222Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml (deflated 85%) 2025-12-04T16:18:56.5873641Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml (deflated 85%) 2025-12-04T16:18:56.5875066Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml (deflated 85%) 2025-12-04T16:18:56.5876490Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml (deflated 82%) 2025-12-04T16:18:56.5877971Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml (deflated 82%) 2025-12-04T16:18:56.5879380Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml (deflated 82%) 2025-12-04T16:18:56.5880899Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml (deflated 84%) 2025-12-04T16:18:56.5882402Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml (deflated 82%) 2025-12-04T16:18:56.5883826Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml (deflated 82%) 2025-12-04T16:18:56.5885254Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml (deflated 82%) 2025-12-04T16:18:56.5886663Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml (deflated 82%) 2025-12-04T16:18:56.5888094Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml (deflated 82%) 2025-12-04T16:18:56.5889526Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml (deflated 82%) 2025-12-04T16:18:56.5890950Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml (deflated 82%) 2025-12-04T16:18:56.5892360Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml (deflated 82%) 2025-12-04T16:18:56.5893794Z adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml (deflated 77%) 2025-12-04T16:18:56.6060177Z adding: test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.xml (deflated 99%) 2025-12-04T16:18:56.6078822Z adding: test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.xml (deflated 88%) 2025-12-04T16:18:56.6103315Z adding: test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.xml (deflated 95%) 2025-12-04T16:18:56.6112800Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.xml (deflated 91%) 2025-12-04T16:18:56.6123741Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.xml (deflated 91%) 2025-12-04T16:18:56.6133279Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.xml (deflated 91%) 2025-12-04T16:18:56.6143293Z adding: test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.xml (deflated 91%) 2025-12-04T16:18:56.6296585Z adding: test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.xml (deflated 96%) 2025-12-04T16:18:56.6310598Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.xml (deflated 96%) 2025-12-04T16:18:56.6344594Z adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.xml (deflated 98%) 2025-12-04T16:18:56.6413753Z adding: test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.xml (deflated 95%) 2025-12-04T16:18:56.6487397Z adding: test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.xml (deflated 95%) 2025-12-04T16:18:56.6523429Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.xml (deflated 93%) 2025-12-04T16:18:56.6557226Z adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.xml (deflated 93%) 2025-12-04T16:18:56.6570483Z adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.xml (deflated 96%) 2025-12-04T16:18:56.6577976Z adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.xml (deflated 94%) 2025-12-04T16:18:56.6579448Z adding: test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.xml (deflated 28%) 2025-12-04T16:18:56.6583335Z adding: test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.xml (deflated 90%) 2025-12-04T16:18:56.6584474Z adding: test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.xml (deflated 93%) 2025-12-04T16:18:56.6585672Z adding: test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.xml (deflated 91%) 2025-12-04T16:18:56.6586906Z adding: test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.xml (deflated 61%) 2025-12-04T16:18:56.6588323Z adding: test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.xml (deflated 28%) 2025-12-04T16:18:56.6589758Z adding: test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.xml (deflated 93%) 2025-12-04T16:18:56.6591099Z adding: test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.xml (deflated 68%) 2025-12-04T16:18:56.6592467Z adding: test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.xml (deflated 59%) 2025-12-04T16:18:56.6593864Z adding: test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.xml (deflated 95%) 2025-12-04T16:18:56.6595266Z adding: test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.xml (deflated 60%) 2025-12-04T16:18:56.6596642Z adding: test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.xml (deflated 52%) 2025-12-04T16:18:56.6598138Z adding: test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.xml (deflated 71%) 2025-12-04T16:18:56.6599617Z adding: test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.xml (deflated 86%) 2025-12-04T16:18:56.6601102Z adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.xml (deflated 68%) 2025-12-04T16:18:56.6602422Z adding: test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.xml (deflated 64%) 2025-12-04T16:18:56.6603622Z adding: test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.xml (deflated 78%) 2025-12-04T16:18:56.6604983Z adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.xml (deflated 68%) 2025-12-04T16:18:56.6606395Z adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.xml (deflated 83%) 2025-12-04T16:18:56.6607640Z adding: test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.xml (deflated 71%) 2025-12-04T16:18:56.6608929Z adding: test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.xml (deflated 73%) 2025-12-04T16:18:56.6622850Z adding: test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.xml (deflated 96%) 2025-12-04T16:18:56.6624027Z adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.xml (deflated 35%) 2025-12-04T16:18:56.6659643Z adding: test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.xml (deflated 95%) 2025-12-04T16:18:56.6661173Z adding: test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.xml (deflated 53%) 2025-12-04T16:18:56.6662713Z adding: test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.xml (deflated 67%) 2025-12-04T16:18:56.6664118Z adding: test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.xml (deflated 82%) 2025-12-04T16:18:56.6671149Z adding: test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.xml (deflated 94%) 2025-12-04T16:18:56.6672519Z adding: test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.xml (deflated 81%) 2025-12-04T16:18:56.6673937Z adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.xml (deflated 59%) 2025-12-04T16:18:56.6675368Z adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.xml (deflated 63%) 2025-12-04T16:18:56.6676830Z adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.xml (deflated 84%) 2025-12-04T16:18:56.6678177Z adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.xml (deflated 62%) 2025-12-04T16:18:56.6679519Z adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.xml (deflated 52%) 2025-12-04T16:18:56.6680818Z adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.xml (deflated 61%) 2025-12-04T16:18:56.6693978Z adding: test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.xml (deflated 97%) 2025-12-04T16:18:56.6704028Z adding: test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.xml (deflated 96%) 2025-12-04T16:18:56.6705154Z adding: test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.xml (deflated 68%) 2025-12-04T16:18:56.6706377Z adding: test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.xml (deflated 76%) 2025-12-04T16:18:56.6707614Z adding: test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.xml (deflated 62%) 2025-12-04T16:18:56.6708731Z adding: test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.xml (deflated 62%) 2025-12-04T16:18:56.6710041Z adding: test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.xml (deflated 37%) 2025-12-04T16:18:56.6711695Z adding: test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.xml (deflated 87%) 2025-12-04T16:18:56.6712880Z adding: test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.xml (deflated 77%) 2025-12-04T16:18:56.6714148Z adding: test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.xml (deflated 75%) 2025-12-04T16:18:56.6715392Z adding: test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.xml (deflated 70%) 2025-12-04T16:18:56.6827298Z adding: test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.xml (deflated 88%) 2025-12-04T16:18:56.6828448Z adding: test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.xml (deflated 81%) 2025-12-04T16:18:56.6848970Z adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.xml (deflated 97%) 2025-12-04T16:18:56.6850591Z adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.xml (deflated 85%) 2025-12-04T16:18:56.6851903Z adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.xml (deflated 90%) 2025-12-04T16:18:56.6853164Z adding: test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.xml (deflated 72%) 2025-12-04T16:18:56.6855800Z adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.xml (deflated 86%) 2025-12-04T16:18:56.6861193Z adding: test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.xml (deflated 89%) 2025-12-04T16:18:56.6862495Z adding: test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.xml (deflated 66%) 2025-12-04T16:18:56.6864914Z adding: test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.xml (deflated 92%) 2025-12-04T16:18:56.6866203Z adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.xml (deflated 77%) 2025-12-04T16:18:56.6867422Z adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.xml (deflated 93%) 2025-12-04T16:18:56.6868808Z adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.xml (deflated 92%) 2025-12-04T16:18:56.6870146Z adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.xml (deflated 64%) 2025-12-04T16:18:56.6873596Z adding: test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.xml (deflated 91%) 2025-12-04T16:18:56.6875378Z adding: test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.xml (deflated 87%) 2025-12-04T16:18:56.6876627Z adding: test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.xml (deflated 28%) 2025-12-04T16:18:56.6877911Z adding: test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.xml (deflated 58%) 2025-12-04T16:18:56.6879296Z adding: test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.xml (deflated 76%) 2025-12-04T16:18:56.6880610Z adding: test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.xml (deflated 67%) 2025-12-04T16:18:56.6881792Z adding: test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.xml (deflated 69%) 2025-12-04T16:18:56.6882982Z adding: test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.xml (deflated 85%) 2025-12-04T16:18:56.6884128Z adding: test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.xml (deflated 92%) 2025-12-04T16:18:56.6885315Z adding: test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.xml (deflated 75%) 2025-12-04T16:18:56.6886479Z adding: test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.xml (deflated 90%) 2025-12-04T16:18:56.6887588Z adding: test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.xml (deflated 73%) 2025-12-04T16:18:56.6888689Z adding: test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.xml (deflated 77%) 2025-12-04T16:18:56.6893213Z adding: test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.xml (deflated 97%) 2025-12-04T16:18:56.6894452Z adding: test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.xml (deflated 65%) 2025-12-04T16:18:56.6895726Z adding: test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.xml (deflated 85%) 2025-12-04T16:18:56.6924275Z adding: test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.xml (deflated 95%) 2025-12-04T16:18:56.6925239Z adding: test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.xml (deflated 82%) 2025-12-04T16:18:56.6926339Z adding: test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.xml (deflated 37%) 2025-12-04T16:18:56.6928444Z adding: test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.xml (deflated 86%) 2025-12-04T16:18:56.6929581Z adding: test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.xml (deflated 89%) 2025-12-04T16:18:56.6930674Z adding: test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.xml (deflated 87%) 2025-12-04T16:18:56.6931690Z adding: test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.xml (deflated 68%) 2025-12-04T16:18:56.6932753Z adding: test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.xml (deflated 86%) 2025-12-04T16:18:56.6933866Z adding: test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.xml (deflated 73%) 2025-12-04T16:18:56.6937695Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.xml (deflated 93%) 2025-12-04T16:18:56.6955465Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.xml (deflated 96%) 2025-12-04T16:18:56.6956713Z adding: test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.xml (deflated 39%) 2025-12-04T16:18:56.6965434Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.xml (deflated 95%) 2025-12-04T16:18:56.6968112Z adding: test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.xml (deflated 96%) 2025-12-04T16:18:56.6970181Z adding: test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.xml (deflated 89%) 2025-12-04T16:18:56.6987176Z adding: test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.xml (deflated 97%) 2025-12-04T16:18:56.6990295Z adding: test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.xml (deflated 91%) 2025-12-04T16:18:56.6992041Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.xml (deflated 95%) 2025-12-04T16:18:56.6993570Z adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.xml (deflated 91%) 2025-12-04T16:18:56.6994941Z adding: test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.xml (deflated 83%) 2025-12-04T16:18:56.6996213Z adding: test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.xml (deflated 64%) 2025-12-04T16:18:56.7000679Z adding: test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.xml (deflated 96%) 2025-12-04T16:18:56.7005327Z adding: test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.xml (deflated 90%) 2025-12-04T16:18:56.7035486Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T16:18:56.7036035Z # Remove any previous usage logs if they exist 2025-12-04T16:18:56.7036461Z rm -f logs-*.zip 2025-12-04T16:18:56.7036890Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T16:18:56.7037498Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T16:18:56.7044454Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:56.7044888Z env: 2025-12-04T16:18:56.7045138Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.7045452Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.7045807Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.7046549Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.7047461Z FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T16:18:56.7048029Z ##[endgroup] 2025-12-04T16:18:56.7120962Z adding: usage_log.txt (deflated 58%) 2025-12-04T16:18:56.7196560Z adding: test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log (deflated 92%) 2025-12-04T16:18:56.7197418Z adding: test/test-reports/test_autocast_1.1_7cd62703ceb14b05_.log (deflated 76%) 2025-12-04T16:18:56.7213446Z adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.5_8dad9aa6fdc82df0_.log (deflated 91%) 2025-12-04T16:18:56.7247960Z adding: test/test-reports/test_fx_1.1_fe3aedf5a60597eb_.log (deflated 92%) 2025-12-04T16:18:56.7269536Z adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.5_0c7fd80a5a340f9b_.log (deflated 92%) 2025-12-04T16:18:56.7272271Z adding: test/test-reports/test_cpp_extensions_jit_1.1_53eadff4adfe6cf3_.log (deflated 88%) 2025-12-04T16:18:56.7277712Z adding: test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log (deflated 94%) 2025-12-04T16:18:56.7278640Z adding: test/test-reports/profiler.test_python_tracer_1.1_2f036554f4a33837_.log (deflated 59%) 2025-12-04T16:18:56.7288098Z adding: test/test-reports/inductor.test_torchinductor_opinfo_3.17_09d50cf3d15b8ee9_.log (deflated 92%) 2025-12-04T16:18:56.7294277Z adding: test/test-reports/distributions.test_distributions_1.1_10129d86baeaadf5_.log (deflated 90%) 2025-12-04T16:18:56.7303959Z adding: test/test-reports/inductor.test_torchinductor_opinfo_8.17_f4805f992a426064_.log (deflated 92%) 2025-12-04T16:18:56.7304842Z adding: test/test-reports/test_logging_1.1_4a28eee8affd86e2_.log (deflated 49%) 2025-12-04T16:18:56.7311631Z adding: test/test-reports/inductor.test_torchinductor_opinfo_13.17_50bb27b4d6383988_.log (deflated 91%) 2025-12-04T16:18:56.7317737Z adding: test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log (deflated 93%) 2025-12-04T16:18:56.7361451Z adding: test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log (deflated 96%) 2025-12-04T16:18:56.7371273Z adding: test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log (deflated 88%) 2025-12-04T16:18:56.7411347Z adding: test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log (deflated 97%) 2025-12-04T16:18:56.7412295Z adding: test/test-reports/inductor.test_deterministic_1.8_262bcacfdd50a1f9_.log (deflated 65%) 2025-12-04T16:18:56.7413188Z adding: test/test-reports/inductor.test_deterministic_6.8_b1bfd086dab71470_.log (deflated 58%) 2025-12-04T16:18:56.7414117Z adding: test/test-reports/inductor.test_extension_backend_1.1_057698d7e9793b3b_.log (deflated 56%) 2025-12-04T16:18:56.7416929Z adding: test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log (deflated 92%) 2025-12-04T16:18:56.7418168Z adding: test/test-reports/dynamo.test_fx_graph_runnable_1.1_bc88b60e43fe7f12_.log (deflated 80%) 2025-12-04T16:18:56.7438021Z adding: test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log (deflated 97%) 2025-12-04T16:18:56.7439198Z adding: test/test-reports/dynamo.test_streams_1.1_834a989fad2ef2e3_.log (deflated 79%) 2025-12-04T16:18:56.7479726Z adding: test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log (deflated 96%) 2025-12-04T16:18:56.7480864Z adding: test/test-reports/inductor.test_scatter_optimization_1.1_7430a249406bb12a_.log (deflated 78%) 2025-12-04T16:18:56.7565016Z adding: test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log (deflated 98%) 2025-12-04T16:18:56.7882628Z adding: test/test-reports/test_transformers_1.1_cd619bbaee31992c_.log (deflated 98%) 2025-12-04T16:18:56.7904937Z adding: test/test-reports/test_autograd_1.1_343bbb8e8e4f4e62_.log (deflated 88%) 2025-12-04T16:18:56.7946520Z adding: test/test-reports/test_sparse_1.2_170c4a4cb63931fe_.log (deflated 94%) 2025-12-04T16:18:56.7962410Z adding: test/test-reports/test_decomp_2.17_4858d88ccf44ed88_.log (deflated 89%) 2025-12-04T16:18:56.7979692Z adding: test/test-reports/test_decomp_7.17_ecdc7da48044ddba_.log (deflated 89%) 2025-12-04T16:18:56.7995292Z adding: test/test-reports/test_decomp_12.17_884069b3bca145fc_.log (deflated 89%) 2025-12-04T16:18:56.8012417Z adding: test/test-reports/test_decomp_17.17_4ba2ec57e0bb6714_.log (deflated 89%) 2025-12-04T16:18:56.8227247Z adding: test/test-reports/test_meta_5.5_1a0c05f4e7432569_.log (deflated 93%) 2025-12-04T16:18:56.8242876Z adding: test/test-reports/test_nestedtensor_1.4_6dff2e85dc80cacf_.log (deflated 91%) 2025-12-04T16:18:56.8257695Z adding: test/test-reports/test_nestedtensor_4.4_fadd9c2633e00561_.log (deflated 92%) 2025-12-04T16:18:56.8343237Z adding: test/test-reports/test_ops_5.11_352ce2577683b96d_.log (deflated 91%) 2025-12-04T16:18:56.8427231Z adding: test/test-reports/test_ops_10.11_9feb13593ea58df6_.log (deflated 91%) 2025-12-04T16:18:56.8467701Z adding: test/test-reports/functorch.test_ops_2.7_066e83f50e6dcbea_.log (deflated 92%) 2025-12-04T16:18:56.8508081Z adding: test/test-reports/functorch.test_ops_7.7_c87f7efa94ae13b4_.log (deflated 92%) 2025-12-04T16:18:56.8508948Z adding: test/test-reports/inductor.test_max_autotune_1.1_dc9c21bc2c4ad5fc_.log (deflated 34%) 2025-12-04T16:18:56.8518080Z adding: test/test-reports/inductor.test_cpu_repro_3.3_41613d465af9d6d5_.log (deflated 93%) 2025-12-04T16:18:56.8521679Z adding: test/test-reports/test_python_dispatch_1.1_4a43d809046600b7_.log (deflated 87%) 2025-12-04T16:18:56.8525811Z adding: test/test-reports/inductor.test_mkldnn_pattern_matcher_2.3_52e8559de495a0be_.log (deflated 92%) 2025-12-04T16:18:56.8526800Z adding: test/test-reports/inductor.test_cpu_select_algorithm_1.1_2b85f4e0fd3f066c_.log (deflated 49%) 2025-12-04T16:18:56.8534605Z adding: test/test-reports/test_custom_ops_1.1_37d60717605e8cfe_.log (deflated 89%) 2025-12-04T16:18:56.8535820Z adding: test/test-reports/inductor.test_analysis_1.1_a128307487ad43a3_.log (deflated 85%) 2025-12-04T16:18:56.8536827Z adding: test/test-reports/inductor.test_pad_mm_1.1_bfb512e8053e306d_.log (deflated 79%) 2025-12-04T16:18:56.8537686Z adding: test/test-reports/inductor.test_triton_syntax_1.1_cd6b570d7971cca9_.log (deflated 51%) 2025-12-04T16:18:56.8539222Z adding: test/test-reports/nn.test_lazy_modules_1.1_641ede76abd1387b_.log (deflated 86%) 2025-12-04T16:18:56.8540140Z adding: test/test-reports/inductor.test_triton_extension_backend_1.1_e218feea67d6cd2a_.log (deflated 50%) 2025-12-04T16:18:56.8541602Z adding: test/test-reports/test_sparse_semi_structured_1.1_4dd53f61ed651a5b_.log (deflated 87%) 2025-12-04T16:18:56.8542503Z adding: test/test-reports/inductor.test_op_completeness_1.1_5deb9907383c3460_.log (deflated 65%) 2025-12-04T16:18:56.8543418Z adding: test/test-reports/inductor.test_subgraph_choice_1.1_927735b69ebf1973_.log (deflated 55%) 2025-12-04T16:18:56.8544356Z adding: test/test-reports/inductor.test_cutedsl_grouped_mm_1.1_4f25a6335f622148_.log (deflated 89%) 2025-12-04T16:18:56.8545292Z adding: test/test-reports/inductor.test_cpp_wrapper_hipify_1.1_353d02c262482f20_.log (deflated 61%) 2025-12-04T16:18:56.8546203Z adding: test/test-reports/inductor.test_inductor_utils_1.1_67afa62609840b86_.log (deflated 56%) 2025-12-04T16:18:56.8547037Z adding: test/test-reports/nn.test_pruning_1.1_fc4532e556fbe9d9_.log (deflated 81%) 2025-12-04T16:18:56.8547950Z adding: test/test-reports/inductor.test_template_heuristics_registry_1.1_3f598775c056439a_.log (deflated 71%) 2025-12-04T16:18:56.8548922Z adding: test/test-reports/inductor.test_async_compile_1.1_887cb91e60faea2f_.log (deflated 68%) 2025-12-04T16:18:56.8549917Z adding: test/test-reports/dynamo.test_deque_reconstruct_1.1_f8b7d34594077ea6_.log (deflated 63%) 2025-12-04T16:18:56.8550780Z adding: test/test-reports/inductor.test_utils_1.1_63e5e2174acc542d_.log (deflated 67%) 2025-12-04T16:18:56.8551667Z adding: test/test-reports/inductor.test_indexing_1.1_2bd025888cab1cf8_.log (deflated 78%) 2025-12-04T16:18:56.8552587Z adding: test/test-reports/inductor.test_inductor_annotations_1.1_e129b89bdd73962f_.log (deflated 59%) 2025-12-04T16:18:56.8553615Z adding: test/test-reports/inductor.test_compile_worker_1.1_00f9da717f84f877_.log (deflated 76%) 2025-12-04T16:18:56.8554475Z adding: test/test-reports/dynamo.test_einops_1.1_fa1def1006f21bae_.log (deflated 59%) 2025-12-04T16:18:56.8555361Z adding: test/test-reports/inductor.test_external_callables_1.1_532bdcfa274f54bc_.log (deflated 60%) 2025-12-04T16:18:56.8602253Z adding: test/test-reports/test_testing_1.1_a28c99e40f247370_.log (deflated 94%) 2025-12-04T16:18:56.8603072Z adding: test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_7c7f9dd585a9f6c9_.log (deflated 53%) 2025-12-04T16:18:56.8630664Z adding: test/test-reports/export.test_strict_export_v2_1.1_3c4ed2fe1af04b4b_.log (deflated 92%) 2025-12-04T16:18:56.8631495Z adding: test/test-reports/test_monitor_1.1_60acff8e80cf96a3_.log (deflated 62%) 2025-12-04T16:18:56.8632382Z adding: test/test-reports/export.test_functionalized_assertions_1.1_7d17ab73392af6b4_.log (deflated 60%) 2025-12-04T16:18:56.8633362Z adding: test/test-reports/inductor.test_selective_lowering_1.1_e1c78d2a5185c394_.log (deflated 58%) 2025-12-04T16:18:56.8634267Z adding: test/test-reports/dynamo.test_base_output_1.1_c6d6552f20e02364_.log (deflated 67%) 2025-12-04T16:18:56.8635132Z adding: test/test-reports/inductor.test_lookup_table_1.1_47a98ebb9baf620f_.log (deflated 6%) 2025-12-04T16:18:56.8638484Z adding: test/test-reports/export.test_serialize_1.1_aebb5c7eea9352a2_.log (deflated 88%) 2025-12-04T16:18:56.8640704Z adding: test/test-reports/torch_np.numpy_tests.lib.test_shape_base__1.1_462d874ba4c079f0_.log (deflated 87%) 2025-12-04T16:18:56.8641706Z adding: test/test-reports/inductor.test_move_constructors_to_gpu_1.1_3373ad77744fe6e4_.log (deflated 70%) 2025-12-04T16:18:56.8642709Z adding: test/test-reports/inductor.test_remote_cache_1.1_46ddba7c7bb0dd06_.log (deflated 60%) 2025-12-04T16:18:56.8643568Z adding: test/test-reports/test_cuda_sanitizer_1.1_06ff5e3bcde71deb_.log (deflated 80%) 2025-12-04T16:18:56.8644493Z adding: test/test-reports/inductor.test_coordinate_descent_tuner_1.1_ec23ddb0902f120e_.log (deflated 68%) 2025-12-04T16:18:56.8645464Z adding: test/test-reports/inductor.test_inplace_padding_1.1_79ffe73bfaa271da_.log (deflated 67%) 2025-12-04T16:18:56.8646380Z adding: test/test-reports/inductor.test_cudacodecache_1.1_0486dc99f2c38224_.log (deflated 56%) 2025-12-04T16:18:56.8647293Z adding: test/test-reports/inductor.test_minifier_utils_1.1_29e2300addd2b151_.log (deflated 59%) 2025-12-04T16:18:56.8648184Z adding: test/test-reports/inductor.test_debug_trace_1.1_9dbcd0e5470fca07_.log (deflated 61%) 2025-12-04T16:18:56.8660111Z adding: test/test-reports/inductor.test_foreach_1.1_72dc555a9d39f8a0_.log (deflated 93%) 2025-12-04T16:18:56.8678775Z adding: test/test-reports/inductor.test_cache_1.1_b15a3258d122eb10_.log (deflated 95%) 2025-12-04T16:18:56.8679587Z adding: test/test-reports/dynamo.test_config_1.1_34b955669d56d548_.log (deflated 62%) 2025-12-04T16:18:56.8680427Z adding: test/test-reports/dynamo.test_metrics_context_1.1_5c0162a494019d34_.log (deflated 72%) 2025-12-04T16:18:56.8681272Z adding: test/test-reports/export.test_package_1.1_c7910f2956ab0b71_.log (deflated 59%) 2025-12-04T16:18:56.8682111Z adding: test/test-reports/dynamo.test_nops_1.1_eec8955a89c0749e_.log (deflated 58%) 2025-12-04T16:18:56.8682906Z adding: test/test-reports/test_bundled_inputs_1.1_395d728a16287961_.log (deflated 73%) 2025-12-04T16:18:56.8683808Z adding: test/test-reports/inductor.test_graph_transform_observer_1.1_2166094392cbcf10_.log (deflated 54%) 2025-12-04T16:18:56.8685087Z adding: test/test-reports/export.test_db_1.1_e88cbc04d8a44796_.log (deflated 82%) 2025-12-04T16:18:56.8685908Z adding: test/test-reports/dynamo.test_export_mutations_1.1_68937c62c4814f0f_.log (deflated 71%) 2025-12-04T16:18:56.8686827Z adding: test/test-reports/inductor.test_config_1.1_8da77f3c96eb0a54_.log (deflated 74%) 2025-12-04T16:18:56.8687690Z adding: test/test-reports/inductor.test_dependencies_1.1_a229a828add2b21e_.log (deflated 67%) 2025-12-04T16:18:56.8688639Z adding: test/test-reports/inductor.test_fuzzer_1.1_7ef41a4207e7fec8_.log (deflated 70%) 2025-12-04T16:18:56.8689444Z adding: test/test-reports/dynamo.test_global_1.1_be67321ce36fdfe2_.log (deflated 73%) 2025-12-04T16:18:56.9424490Z adding: test/test-reports/inductor.test_control_flow_1.4_b6ec092c04daf6c8_.log (deflated 97%) 2025-12-04T16:18:56.9425378Z adding: test/test-reports/dynamo.test_cudagraphs_1.1_f31f593cd6865772_.log (deflated 68%) 2025-12-04T16:18:56.9426226Z adding: test/test-reports/inductor.test_alignment_1.1_c850ab1c90ef7284_.log (deflated 73%) 2025-12-04T16:18:56.9427068Z adding: test/test-reports/dynamo.test_profiler_1.1_bdf79e2257b8f437_.log (deflated 72%) 2025-12-04T16:18:56.9429247Z adding: test/test-reports/dynamo.test_guard_serialization_1.1_ca95c718e2b65acd_.log (deflated 84%) 2025-12-04T16:18:56.9433095Z adding: test/test-reports/dynamo.test_dicts_1.1_9286d343eb07609f_.log (deflated 87%) 2025-12-04T16:18:56.9433919Z adding: test/test-reports/dynamo.test_optimizers_1.1_6e8896f6f8ab34bf_.log (deflated 56%) 2025-12-04T16:18:56.9456027Z adding: test/test-reports/export.test_torchbind_1.1_2a7aef954986f1ed_.log (deflated 96%) 2025-12-04T16:18:56.9456894Z adding: test/test-reports/dynamo.test_python_dispatcher_1.1_d5e45034fa548233_.log (deflated 69%) 2025-12-04T16:18:56.9457751Z adding: test/test-reports/export.test_swap_1.1_75b32b5d64f61c05_.log (deflated 78%) 2025-12-04T16:18:56.9459164Z adding: test/test-reports/export.test_unflatten_1.1_e240ad71aaf7be43_.log (deflated 78%) 2025-12-04T16:18:56.9460069Z adding: test/test-reports/dynamo.test_verify_correctness_1.1_c32bdac20cc2dbcb_.log (deflated 67%) 2025-12-04T16:18:56.9462836Z adding: test/test-reports/inductor.test_fxir_backend_1.1_615cfb6d9761ce74_.log (deflated 84%) 2025-12-04T16:18:56.9464923Z adding: test/test-reports/dynamo.test_structured_trace_1.1_e2032e57f1fbb9a7_.log (deflated 82%) 2025-12-04T16:18:56.9465799Z adding: test/test-reports/dynamo.test_torchrec_1.1_ef7e4418db36eb14_.log (deflated 49%) 2025-12-04T16:18:56.9466666Z adding: test/test-reports/test_model_exports_to_core_aten_1.1_1858ccc543938d86_.log (deflated 52%) 2025-12-04T16:18:56.9467573Z adding: test/test-reports/dynamo.test_precompile_context_1.1_a5d2ca6b4ab870b9_.log (deflated 60%) 2025-12-04T16:18:56.9468469Z adding: test/test-reports/dynamo.test_trace_rules_1.1_6759ebf57891eeeb_.log (deflated 65%) 2025-12-04T16:18:56.9469303Z adding: test/test-reports/export.test_upgrader_1.1_ed15a90621ede266_.log (deflated 66%) 2025-12-04T16:18:56.9470110Z adding: test/test-reports/dynamo.test_hooks_1.1_66426e5cf57243c0_.log (deflated 81%) 2025-12-04T16:18:56.9472278Z adding: test/test-reports/dynamo.test_generator_1.1_f207b5be74916c07_.log (deflated 86%) 2025-12-04T16:18:56.9473107Z adding: test/test-reports/export.test_verifier_1.1_96a0b4295b5beb1c_.log (deflated 71%) 2025-12-04T16:18:56.9476104Z adding: test/test-reports/export.test_sparse_2.2_dc3ae5c04c4515a4_.log (deflated 89%) 2025-12-04T16:18:56.9477243Z adding: test/test-reports/functorch.test_ac_1.1_99b1ba004ab023a0_.log (deflated 68%) 2025-12-04T16:18:56.9478070Z adding: test/test-reports/test_out_dtype_op_1.1_3e48e335f34b8277_.log (deflated 72%) 2025-12-04T16:18:56.9486926Z adding: test/test-reports/torch_np.test_ufuncs_basic_1.1_5b79d2f51b6173f9_.log (deflated 95%) 2025-12-04T16:18:56.9487791Z adding: test/test-reports/lazy.test_step_closures_1.1_f2cf8fda3341fdfb_.log (deflated 62%) 2025-12-04T16:18:56.9488680Z adding: test/test-reports/functorch.dim.test_getsetitem_1.1_f956801402f0c75a_.log (deflated 79%) 2025-12-04T16:18:56.9495880Z adding: test/test-reports/torch_np.numpy_tests.core.test_numeric_1.1_c2ce2dbd13566161_.log (deflated 90%) 2025-12-04T16:18:56.9500436Z adding: test/test-reports/test_indexing_1.1_2824065dc4dc1509_.log (deflated 90%) 2025-12-04T16:18:56.9523203Z adding: test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.1_f5a85c7d65f3960a_.log (deflated 93%) 2025-12-04T16:18:56.9524179Z adding: test/test-reports/test_itt_1.1_0c67806275155360_.log (deflated 49%) 2025-12-04T16:18:56.9526116Z adding: test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_5bba81624a9a4669_.log (deflated 90%) 2025-12-04T16:18:56.9539119Z adding: test/test-reports/torch_np.numpy_tests.lib.test_function_base_1.1_66e1a2bc19dbe7b5_.log (deflated 93%) 2025-12-04T16:18:56.9543941Z adding: test/test-reports/test_masked_1.1_f4f98418cc401a0c_.log (deflated 92%) 2025-12-04T16:18:56.9544724Z adding: test/test-reports/optim.test_lrscheduler_1.1_50b469a96bd12a6b_.log (deflated 7%) 2025-12-04T16:18:56.9547068Z adding: test/test-reports/test_datapipe_1.1_628e5e9adba39130_.log (deflated 85%) 2025-12-04T16:18:56.9564330Z adding: test/test-reports/nn.test_convolution_1.1_d98f421ddfbea09e_.log (deflated 95%) 2025-12-04T16:18:56.9565880Z adding: test/test-reports/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility_1.1_38e9912ded2d6880_.log (deflated 87%) 2025-12-04T16:18:56.9608888Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T16:18:56.9609956Z # Remove any previous debugging artifacts if they exist 2025-12-04T16:18:56.9610772Z rm -f debug-*.zip 2025-12-04T16:18:56.9611301Z if [ -d 'test/debug' ]; then 2025-12-04T16:18:56.9611770Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T16:18:56.9612163Z fi 2025-12-04T16:18:56.9618799Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:18:56.9619240Z env: 2025-12-04T16:18:56.9619485Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.9619794Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.9620161Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.9620800Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.9621614Z FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427 2025-12-04T16:18:56.9622186Z ##[endgroup] 2025-12-04T16:18:56.9712296Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T16:18:56.9712674Z with: 2025-12-04T16:18:56.9712926Z s3-bucket: gha-artifacts 2025-12-04T16:18:56.9713306Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:56.9713717Z retention-days: 14 2025-12-04T16:18:56.9714003Z if-no-files-found: warn 2025-12-04T16:18:56.9714320Z path: test-jsons-*.zip 2025-12-04T16:18:56.9714613Z name: artifact 2025-12-04T16:18:56.9714861Z region: us-east-1 2025-12-04T16:18:56.9715120Z env: 2025-12-04T16:18:56.9715365Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:56.9715657Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:56.9716023Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:56.9716673Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:56.9717242Z ##[endgroup] 2025-12-04T16:18:57.3691122Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T16:18:57.3691668Z With the provided path, there will be 1 file uploaded 2025-12-04T16:18:57.3692248Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:57.3747183Z Starting upload of test-jsons-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:57.5813652Z Finished upload of test-jsons-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:57.6044540Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T16:18:57.6044933Z with: 2025-12-04T16:18:57.6045194Z s3-bucket: gha-artifacts 2025-12-04T16:18:57.6045738Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:57.6046141Z retention-days: 14 2025-12-04T16:18:57.6046435Z if-no-files-found: error 2025-12-04T16:18:57.6046754Z path: test-reports-*.zip 2025-12-04T16:18:57.6047055Z name: artifact 2025-12-04T16:18:57.6047379Z region: us-east-1 2025-12-04T16:18:57.6047641Z env: 2025-12-04T16:18:57.6047882Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:57.6048174Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:57.6048683Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:57.6049333Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:57.6049901Z ##[endgroup] 2025-12-04T16:18:58.0020851Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T16:18:58.0021402Z With the provided path, there will be 1 file uploaded 2025-12-04T16:18:58.0021923Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:58.0075669Z Starting upload of test-reports-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:58.2456330Z Finished upload of test-reports-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:58.2664068Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T16:18:58.2664460Z with: 2025-12-04T16:18:58.2664714Z s3-bucket: gha-artifacts 2025-12-04T16:18:58.2665087Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:58.2665488Z retention-days: 14 2025-12-04T16:18:58.2665784Z if-no-files-found: ignore 2025-12-04T16:18:58.2666099Z path: logs-*.zip 2025-12-04T16:18:58.2666351Z name: artifact 2025-12-04T16:18:58.2666617Z region: us-east-1 2025-12-04T16:18:58.2666872Z env: 2025-12-04T16:18:58.2667094Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:58.2667400Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:58.2667771Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:58.2668427Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:58.2669006Z ##[endgroup] 2025-12-04T16:18:58.6362798Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T16:18:58.6363361Z With the provided path, there will be 1 file uploaded 2025-12-04T16:18:58.6363924Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:58.6418113Z Starting upload of logs-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:58.8954813Z Finished upload of logs-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip 2025-12-04T16:18:58.9162320Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T16:18:58.9162701Z with: 2025-12-04T16:18:58.9162956Z s3-bucket: gha-artifacts 2025-12-04T16:18:58.9163329Z s3-prefix: pytorch/pytorch/19922826259/1/artifact 2025-12-04T16:18:58.9163739Z retention-days: 14 2025-12-04T16:18:58.9164021Z if-no-files-found: ignore 2025-12-04T16:18:58.9164331Z path: debug-*.zip 2025-12-04T16:18:58.9164611Z name: artifact 2025-12-04T16:18:58.9164857Z region: us-east-1 2025-12-04T16:18:58.9165111Z env: 2025-12-04T16:18:58.9165347Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:58.9165641Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:58.9166009Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:58.9166664Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:58.9167230Z ##[endgroup] 2025-12-04T16:18:59.2792857Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T16:18:59.3007666Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T16:18:59.3008120Z # shellcheck disable=SC2156 2025-12-04T16:18:59.3008819Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T16:18:59.3015745Z shell: /usr/bin/bash -e {0} 2025-12-04T16:18:59.3016067Z env: 2025-12-04T16:18:59.3016316Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:59.3016747Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:59.3017100Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:59.3017756Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:59.3018415Z ##[endgroup] 2025-12-04T16:18:59.6723415Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T16:18:59.6723999Z with: 2025-12-04T16:18:59.6724421Z name: coredumps-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu 2025-12-04T16:18:59.6724949Z retention-days: 14 2025-12-04T16:18:59.6725247Z if-no-files-found: ignore 2025-12-04T16:18:59.6725548Z path: ./**/core.[1-9]* 2025-12-04T16:18:59.6725848Z s3-bucket: gha-artifacts 2025-12-04T16:18:59.6726162Z region: us-east-1 2025-12-04T16:18:59.6726408Z env: 2025-12-04T16:18:59.6726648Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:18:59.6726955Z HAS_NVIDIA_GPU: true 2025-12-04T16:18:59.6727308Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:18:59.6727968Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:18:59.6728541Z ##[endgroup] 2025-12-04T16:19:09.0699652Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T16:19:09.1003065Z Prepare all required actions 2025-12-04T16:19:09.1003544Z Getting action download info 2025-12-04T16:19:09.2794618Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T16:19:09.6655773Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T16:19:09.6656213Z with: 2025-12-04T16:19:09.6656464Z job_id: 57119749427 2025-12-04T16:19:09.6657198Z job_name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable) 2025-12-04T16:19:09.6657991Z workflow_name: periodic 2025-12-04T16:19:09.6658310Z workflow_run_id: 19922826259 2025-12-04T16:19:09.6658633Z workflow_attempt: 1 2025-12-04T16:19:09.6658907Z env: 2025-12-04T16:19:09.6659148Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:09.6659458Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:09.6659817Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:09.6660528Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:09.6661098Z ##[endgroup] 2025-12-04T16:19:09.6721579Z ##[group]Run actions/setup-python@v6 2025-12-04T16:19:09.6721947Z with: 2025-12-04T16:19:09.6722291Z python-version: 3.10 2025-12-04T16:19:09.6722602Z check-latest: false 2025-12-04T16:19:09.6723073Z token: *** 2025-12-04T16:19:09.6723333Z update-environment: true 2025-12-04T16:19:09.6723669Z allow-prereleases: false 2025-12-04T16:19:09.6723994Z freethreaded: false 2025-12-04T16:19:09.6724263Z env: 2025-12-04T16:19:09.6724508Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:09.6724809Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:09.6725158Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:09.6725838Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:09.6726414Z ##[endgroup] 2025-12-04T16:19:09.8415864Z ##[group]Installed versions 2025-12-04T16:19:09.8426223Z Version 3.10 was not found in the local cache 2025-12-04T16:19:09.8625094Z (node:341813) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T16:19:09.8626045Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T16:19:10.2372067Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T16:19:10.2540733Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T16:19:10.2541247Z with: 2025-12-04T16:19:10.2541467Z env: 2025-12-04T16:19:10.2541814Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:10.2542130Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:10.2542503Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:10.2543147Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:10.2543798Z ##[endgroup] 2025-12-04T16:19:10.2561606Z ##[group]Run set -eou pipefail 2025-12-04T16:19:10.2562140Z set -eou pipefail 2025-12-04T16:19:10.2562455Z  2025-12-04T16:19:10.2562878Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T16:19:10.2563426Z for _ in $(seq 1440); do 2025-12-04T16:19:10.2563799Z  # Break if no ssh session exists anymore 2025-12-04T16:19:10.2564207Z  if [ "$(who)" = "" ]; then 2025-12-04T16:19:10.2564592Z  break 2025-12-04T16:19:10.2564862Z  fi 2025-12-04T16:19:10.2565119Z  echo "." 2025-12-04T16:19:10.2565397Z  sleep 5 2025-12-04T16:19:10.2565657Z done 2025-12-04T16:19:10.2572854Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:19:10.2573301Z env: 2025-12-04T16:19:10.2573538Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:10.2573849Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:10.2574219Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:10.2574857Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:10.2575436Z ##[endgroup] 2025-12-04T16:19:10.2608761Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T16:19:10.2693970Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T16:19:10.2694622Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T16:19:10.2695146Z # shellcheck disable=SC2046 2025-12-04T16:19:10.2695541Z docker stop $(docker ps -q) || true 2025-12-04T16:19:10.2695949Z # Prune all of the docker images 2025-12-04T16:19:10.2696332Z docker system prune -af 2025-12-04T16:19:10.2703036Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:19:10.2703490Z env: 2025-12-04T16:19:10.2703725Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:10.2704048Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:10.2704414Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:10.2705057Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:10.2705640Z ##[endgroup] 2025-12-04T16:19:21.2713665Z 428ca50ff249 2025-12-04T16:19:25.9719388Z Deleted Containers: 2025-12-04T16:19:25.9719896Z 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:25.9720349Z 2025-12-04T16:19:34.0788758Z Deleted Images: 2025-12-04T16:19:34.0789834Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T16:19:34.0791372Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940 2025-12-04T16:19:34.0792450Z deleted: sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024 2025-12-04T16:19:34.0793221Z deleted: sha256:f57a578c46f36a858c2be92210a89558688ee36b619af78c698952c0e3ef05ad 2025-12-04T16:19:34.0793988Z deleted: sha256:ce0698bd1efc811ccead0ecdad944b4839bf17bff387495b58e64cf8db0e210c 2025-12-04T16:19:34.0794763Z deleted: sha256:f0ee66f328fa98c40f336c64fee9a4b42e51a793cceea7f81932068bdc7bd315 2025-12-04T16:19:34.0795513Z deleted: sha256:ea24b30a25c161bd4bd564bfd90c36d88674a1aa59ef3e65647e926c76685be0 2025-12-04T16:19:34.0796278Z deleted: sha256:15bc0847ce5e60cc1a9b36d25283dc5648fb45e04aa9a8dec984af3c193e2f0b 2025-12-04T16:19:34.0798535Z deleted: sha256:3639aa26691090ef45641c75bffcb2e3f427f5e282abc93d607de4433bf90488 2025-12-04T16:19:34.0799348Z deleted: sha256:86258272ba477934c917d08b21e0da6000c268b60f5a9ae907038e7bf3236532 2025-12-04T16:19:34.0800197Z deleted: sha256:ba8e0040c98ddbf87acbc3ae6575b2933c09421ac7094a96e027d1fc9356fbb6 2025-12-04T16:19:34.0801144Z deleted: sha256:ca0176fc0de6cc059c4dbfc313434b5dea2c90dc24f2dc3a1061b941c7b3e6ca 2025-12-04T16:19:34.0801910Z deleted: sha256:cc6a480ab9e6091c6c206bc9b340611b3863258975e835769bd8f2a38b5d8c13 2025-12-04T16:19:34.0802823Z deleted: sha256:8465c24f0b284d8589ea191edeb80d1da07e4a59dfcfdcfa153bdf3d5d678d3e 2025-12-04T16:19:34.0803592Z deleted: sha256:b93bfbd3b55899c606fb98c5edbd21fd63114862a4f5a5b67c7aa63fc9ada9a3 2025-12-04T16:19:34.0804360Z deleted: sha256:6b7582e3ce445d82e9d2ae7769502119c39c1edbf5fe11c195615db8da846931 2025-12-04T16:19:34.0805097Z deleted: sha256:9d79615a9d9ae67110cc9da697933492b385b1e4708d30c2211625bea5d42f27 2025-12-04T16:19:34.0805940Z deleted: sha256:7132c6db5e7d5692786167dfb22dea62d8203dc7837b2d1de435c6e5c85e906e 2025-12-04T16:19:34.0806690Z deleted: sha256:d61bc13a0957d633ff633186c6cbdf48da1c551991d814281262e58709e225a8 2025-12-04T16:19:34.0807561Z deleted: sha256:0c348bbc3988acd329b3e42de4d2c73d5dc4942618716ca312d389d4f704f4bb 2025-12-04T16:19:34.0808302Z deleted: sha256:28d30dd15686ab6819c2f03388c9999bbdaef35e8756817297d795e00dd623fc 2025-12-04T16:19:34.0809056Z deleted: sha256:0a57608df6cffb31a0b24f2537b4dfe7a55bbe6ea02216703cc3172062ab9d75 2025-12-04T16:19:34.0809826Z deleted: sha256:43d23f49f4d70a54b4aff6f4f10d5c5a3d75b100abbbf281ad510177cc80cd99 2025-12-04T16:19:34.0810589Z deleted: sha256:f9e33c2e4c7b8e7179fba052da4d7c4acdc8287f253c95328ae04055755f88a4 2025-12-04T16:19:34.0811342Z deleted: sha256:cfce0930cf33c7136fc92511b9bcad570958363b55e9e0c82e9b8ebc29301356 2025-12-04T16:19:34.0812098Z deleted: sha256:9a709ae20528f500f51271ad2ce6a3d7196fe814a28ae73881901ecef9748c2a 2025-12-04T16:19:34.0812852Z deleted: sha256:68a1d16e9392be6fe939a58c5f941a0919408b5852e52cb04027b0b8777e2b0e 2025-12-04T16:19:34.0813587Z deleted: sha256:042a0022b3eea78f54015f4cf2888bcfa3b91deb0b08830a33c2814b93285dd9 2025-12-04T16:19:34.0814344Z deleted: sha256:a7ba703ff0aa305a608f3b4afd89c2ecd0d1244b127629145a2e691490abb271 2025-12-04T16:19:34.0815119Z deleted: sha256:be44f5fbae55066faba60eebf7065a082abf517ab8f2ebf8ece69e74d45def07 2025-12-04T16:19:34.0815957Z deleted: sha256:a01f1b0d88a8936d648f78787f56579bdb6617edf4620d0410ab6b118351bbb2 2025-12-04T16:19:34.0816902Z deleted: sha256:dc93f45553adafb5c6e7473711c833996f6884dab2da708ffc76b5cf65b8db9d 2025-12-04T16:19:34.0817933Z deleted: sha256:ffdba9ecb5890a9cb23368d781ff5484270b7f13c6d5629feca3512b58b9a0ac 2025-12-04T16:19:34.0818910Z deleted: sha256:268a91c420865628895871795b524436f5cc4403aa53d71f457db21bf42dd530 2025-12-04T16:19:34.0819659Z deleted: sha256:72450bfd97986ccc53d8fa76252130b464fdb3c5fd8e688546e8c3ce0b9d4394 2025-12-04T16:19:34.0820423Z deleted: sha256:63954235d3be0420af6ad2dae2b24849e3eee1edb10cf86d29137c3e19621f47 2025-12-04T16:19:34.0821185Z deleted: sha256:1c4e2d3e68e8a166d1965962077fe194ea00cad2ee636399c0c17ba5a94bdb9c 2025-12-04T16:19:34.0821957Z deleted: sha256:361cacbab7154a0cb62486f57d75b112feedbcc751a7d8f7bb02ec7a61b1fe0d 2025-12-04T16:19:34.0822730Z deleted: sha256:e653f6af92265f4300717bd617aab954cfbf049d4be32e890e57c2e8135be7f9 2025-12-04T16:19:34.0823491Z deleted: sha256:bfffeb2974ffc58c0669724812f701df860257ac3d047a7315a100beb0ea0507 2025-12-04T16:19:34.0824242Z deleted: sha256:6ae48d8efc75420f721058928fe8b1ccf48aa1bdc92de539b1f0db9248a41fcf 2025-12-04T16:19:34.0825006Z deleted: sha256:535c7026785a690366fc69ecbc9a81f1b58a46f63c782620591c1297406a2731 2025-12-04T16:19:34.0825777Z deleted: sha256:8462076c3cc8db6030f38e1137bfbef1aad85404ed4231285c1e06cd414d3e57 2025-12-04T16:19:34.0826539Z deleted: sha256:fe340d63ccb66e5b395b7900c1002a513e4afd7f610e9df5e7262c4f71e93bef 2025-12-04T16:19:34.0827278Z deleted: sha256:b61085386114396fe42144a4aa739b2a0b45f0c30a083462a2ea7b9b675c02aa 2025-12-04T16:19:34.0828237Z deleted: sha256:7772f25c05bcd5ede631d287b826aa108db67c773e377db98ffa73b0917f3629 2025-12-04T16:19:34.0829004Z deleted: sha256:3ea8a43d8193d05ecd6aa473b523a3569e11ae691eed9e6ffd693f23b0106035 2025-12-04T16:19:34.0829802Z deleted: sha256:34647b4087d29cf48a18668bb935a95fc8b2dac3522c2581397f0f27227047fd 2025-12-04T16:19:34.0830608Z deleted: sha256:b6a169f1ab01281c16562ad43b462a1a47a33be8d3cfae0a117ffa5c47d0b532 2025-12-04T16:19:34.0831372Z deleted: sha256:664173a33cd21248a2d73d2eba7887602e36fbc96002d991eb0bd0a2d574ac88 2025-12-04T16:19:34.0832424Z deleted: sha256:d67fdfe94c9a0228f17991cd3e958e36da96d4d597b46773cb7eed98c489f947 2025-12-04T16:19:34.0833461Z deleted: sha256:f2be0722250908742f067756b56ed3fa169daa2f1c8201a7ed4335b2fed2cae5 2025-12-04T16:19:34.0834712Z deleted: sha256:8614db257d8dc9e0f0ee8398a4a4d3c061b2797d6017daaf0696dd7f87633b3e 2025-12-04T16:19:34.0835476Z deleted: sha256:23ee0908a1bf254f1d4dd0591cc0c6801571b4d93950b6fd4fee57ca7e361da0 2025-12-04T16:19:34.0836245Z deleted: sha256:f627a99df4c0f370bd7fc8ea6be7695d8027f988aed52b65233cbcf78b01989b 2025-12-04T16:19:34.0836988Z deleted: sha256:d5e92389b59d4134cdb96113af964186602e98c392e76a8f26d4ea6e54056ccc 2025-12-04T16:19:34.0837751Z deleted: sha256:cbfccf44b9dc670c109634fbf19c2bfff2a3d5243bfa351c851d9fad3f1acfc2 2025-12-04T16:19:34.0838518Z deleted: sha256:1242535e81ad4bd713910a6c5e1b38375b12ed1bcd1b48419813a5ef28a5c84c 2025-12-04T16:19:34.0839271Z deleted: sha256:10b1394079cfe756a1ad9aa9aa3a2995bd5e46ef1e18029eb9eae0398f6d4e88 2025-12-04T16:19:34.0840015Z deleted: sha256:1d32da9a5f10e10c4a97a839151a1943d4db18494e8080bea91a6c9784fde067 2025-12-04T16:19:34.0840770Z deleted: sha256:af2fd59653ebd685a032ef800f8227c0d7b9b0e5ef397b30d4301e001c943e8b 2025-12-04T16:19:34.0841535Z deleted: sha256:c48d351980e3bd24d533ae55d1acc6a27911dffcbb03b2ae552d7ccc3e4cd74f 2025-12-04T16:19:34.0842342Z deleted: sha256:e663afac609b1b6c812ab45265c27d870b92c9fc6849939f0b8635da83cbfb53 2025-12-04T16:19:34.0843094Z deleted: sha256:f79dc17668331d4214ef24000d5c54a0bb2ba70f152d8523f571e2b76a303f4f 2025-12-04T16:19:34.0843853Z deleted: sha256:00de9606a6cd2a2dfb4ceffcb076474d027a1f6273894677090aee7478035865 2025-12-04T16:19:34.0844619Z deleted: sha256:cf35fe1d0317253b75ee17c12783c2561faebf9bf2c59c07ad4712c053246586 2025-12-04T16:19:34.0845358Z deleted: sha256:06622801490739d9db884c23c05a31a1ee86c41e888b34c3ccef23d37f2bdbb5 2025-12-04T16:19:34.0846118Z deleted: sha256:df5dafcaee865ddfb66e22075c63769836e01a627d6fe46658b6f4b4a25318d3 2025-12-04T16:19:34.0846890Z deleted: sha256:7949ae5c4df921feb0e2cd7bac1e402e1ab9135e758fa41cd567880b354b40bc 2025-12-04T16:19:34.0847655Z deleted: sha256:9f19148d820adb1d6e86d0ce68e21fbcedafa7c7ec6c45c9004fa3a607096923 2025-12-04T16:19:34.0848427Z deleted: sha256:1d37d963e85ce22ffaab56a1cf35b3411f34f9432dc5e49ebbdf6f30816cdfa8 2025-12-04T16:19:34.0849198Z deleted: sha256:bac6d91e3830e51e96879deaa3e6d0d39da076fa802ebda68f81bdf7ef8342d5 2025-12-04T16:19:34.0849959Z deleted: sha256:ffd496b07151c90e7ddd68a81a36471f51a544187982db5e34621358e1b29681 2025-12-04T16:19:34.0850711Z deleted: sha256:890b2042bdb9e22a614cea1be88366cd3ae15159bf78ac510b9daa6f802493a6 2025-12-04T16:19:34.0851475Z deleted: sha256:ddd9a57b20a8b45ae0e8e350ec266d50a1b9e9a7ff4921470eb38f004d50eb20 2025-12-04T16:19:34.0852238Z deleted: sha256:2f4f91684b8221bc5cbc3f14c7e00bb693854027a1a6de5ad6bdcd000bb579f2 2025-12-04T16:19:34.0852986Z deleted: sha256:9c01ec5e73233284a0f9bb42de59696a1fa61caacacdf63d04df5ebd73895d77 2025-12-04T16:19:34.0853745Z deleted: sha256:f6153a90f0f5316b03f1464826325a1578231b89b3c1f1c83cc7cebdd41cee2a 2025-12-04T16:19:34.0854494Z deleted: sha256:4e89cd2181813af7fd2219923bae493e33111d8b4ebd76f257b7fb26744fda28 2025-12-04T16:19:34.0855256Z deleted: sha256:a0b77eb4054db8f2ea2ec957b3941b4aeee14b59e94a99a1521f90d6e41faf0e 2025-12-04T16:19:34.0855999Z deleted: sha256:1a1b2848f15aa5114f5a67e3705439512880bf1a7a6436cc67760c59b5f10c46 2025-12-04T16:19:34.0856735Z deleted: sha256:004fc01362840c164664c18580e479546fa0b7f9599487558f80190aec30e2b5 2025-12-04T16:19:34.0857590Z deleted: sha256:35f36e20799f0a0dead81bc3701732e43489264e6bee9fcb789b376a99e17e78 2025-12-04T16:19:34.0858347Z deleted: sha256:1207fd2ede86015c3f105620cb491e8199d2060a4a87490de358286d0ae52e4e 2025-12-04T16:19:34.0859096Z deleted: sha256:02dccb85ee744d1fbb819c6da618b2c52a3e4affc89e407f79b875e7b3bbb7df 2025-12-04T16:19:34.0859917Z deleted: sha256:d22e6ff9c3ac9dabbcc6052e1459f8dc4ebd19bd057bd0688615d6cc3ebb5cf0 2025-12-04T16:19:34.0860685Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T16:19:34.0861362Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T16:19:34.0862201Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T16:19:34.0863192Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T16:19:34.0863965Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T16:19:34.0864730Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T16:19:34.0865497Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T16:19:34.0866267Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T16:19:34.0867012Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T16:19:34.0867753Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T16:19:34.0868501Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T16:19:34.0868951Z 2025-12-04T16:19:34.0869087Z Total reclaimed space: 35.13GB 2025-12-04T16:19:34.0905293Z ##[group]Run set +e 2025-12-04T16:19:34.0905693Z set +e 2025-12-04T16:19:34.0905948Z set -x 2025-12-04T16:19:34.0906203Z  2025-12-04T16:19:34.0906447Z nvidia-smi 2025-12-04T16:19:34.0906983Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T16:19:34.0907804Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T16:19:34.0908603Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T16:19:34.0909224Z # would be missing (ERR!), for example: 2025-12-04T16:19:34.0909592Z # 2025-12-04T16:19:34.0909952Z # +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.0910588Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T16:19:34.0911255Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T16:19:34.0911888Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T16:19:34.0912579Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T16:19:34.0913143Z # | | | MIG M. | 2025-12-04T16:19:34.0913563Z # |===============================+======================+======================| 2025-12-04T16:19:34.0914053Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T16:19:34.0914620Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T16:19:34.0915141Z # | | | ERR! | 2025-12-04T16:19:34.0915633Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T16:19:34.0916084Z # 2025-12-04T16:19:34.0916437Z # +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.0916988Z # | Processes: | 2025-12-04T16:19:34.0917539Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T16:19:34.0918071Z # | ID ID Usage | 2025-12-04T16:19:34.0918605Z # |=============================================================================| 2025-12-04T16:19:34.0919104Z # +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.0919537Z # 2025-12-04T16:19:34.0920040Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T16:19:34.0920641Z # Docker tries to run with --gpus all 2025-12-04T16:19:34.0921015Z # 2025-12-04T16:19:34.0921428Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T16:19:34.0922130Z # GPU name, so that the command can fail accordingly 2025-12-04T16:19:34.0922715Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T16:19:34.0923203Z NVIDIA_SMI_STATUS=$? 2025-12-04T16:19:34.0923517Z  2025-12-04T16:19:34.0924029Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T16:19:34.0924807Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T16:19:34.0925493Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T16:19:34.0926100Z  .github/scripts/stop_runner_service.sh 2025-12-04T16:19:34.0926488Z fi 2025-12-04T16:19:34.0926720Z  2025-12-04T16:19:34.0927293Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T16:19:34.0928039Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T16:19:34.0928662Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T16:19:34.0929161Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T16:19:34.0929579Z NVIDIA_SMI_STATUS=$? 2025-12-04T16:19:34.0929886Z  2025-12-04T16:19:34.0930379Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T16:19:34.0931151Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T16:19:34.0931844Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T16:19:34.0932444Z  .github/scripts/stop_runner_service.sh 2025-12-04T16:19:34.0932813Z fi 2025-12-04T16:19:34.0933058Z  2025-12-04T16:19:34.0933346Z # Check the GPU count to be a power of 2 2025-12-04T16:19:34.0934004Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T16:19:34.0934885Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T16:19:34.0935560Z  .github/scripts/stop_runner_service.sh 2025-12-04T16:19:34.0935943Z fi 2025-12-04T16:19:34.0944872Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:19:34.0945329Z env: 2025-12-04T16:19:34.0945581Z GIT_DEFAULT_BRANCH: main 2025-12-04T16:19:34.0945876Z HAS_NVIDIA_GPU: true 2025-12-04T16:19:34.0946242Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T16:19:34.0946902Z DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 2025-12-04T16:19:34.0947482Z ##[endgroup] 2025-12-04T16:19:34.0979436Z + nvidia-smi 2025-12-04T16:19:34.1179799Z Thu Dec 4 16:19:34 2025 2025-12-04T16:19:34.1180241Z +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.1180853Z | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | 2025-12-04T16:19:34.1181433Z |-------------------------------+----------------------+----------------------+ 2025-12-04T16:19:34.1182044Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T16:19:34.1182694Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T16:19:34.1183327Z | | | MIG M. | 2025-12-04T16:19:34.1183734Z |===============================+======================+======================| 2025-12-04T16:19:34.1343049Z | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T16:19:34.1343584Z | N/A 25C P8 16W / 70W | 2MiB / 15360MiB | 0% Default | 2025-12-04T16:19:34.1344039Z | | | N/A | 2025-12-04T16:19:34.1344508Z +-------------------------------+----------------------+----------------------+ 2025-12-04T16:19:34.1344978Z 2025-12-04T16:19:34.1345443Z +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.1345941Z | Processes: | 2025-12-04T16:19:34.1346470Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T16:19:34.1346959Z | ID ID Usage | 2025-12-04T16:19:34.1347367Z |=============================================================================| 2025-12-04T16:19:34.1348847Z | No running processes found | 2025-12-04T16:19:34.1349429Z +-----------------------------------------------------------------------------+ 2025-12-04T16:19:34.2169433Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T16:19:34.2346645Z Tesla T4 2025-12-04T16:19:34.2385087Z + NVIDIA_SMI_STATUS=0 2025-12-04T16:19:34.2385409Z + '[' 0 -ne 0 ']' 2025-12-04T16:19:34.2391857Z ++ nvidia-smi --list-gpus 2025-12-04T16:19:34.2392517Z ++ wc -l 2025-12-04T16:19:34.2588979Z + GPU_COUNT=1 2025-12-04T16:19:34.2589256Z + NVIDIA_SMI_STATUS=0 2025-12-04T16:19:34.2589545Z + '[' 0 -ne 0 ']' 2025-12-04T16:19:34.2589833Z + '[' 1 -le 8 ']' 2025-12-04T16:19:34.2590082Z + '[' 1 -ne 1 ']' 2025-12-04T16:19:34.2684243Z Post job cleanup. 2025-12-04T16:19:34.2771834Z Post job cleanup. 2025-12-04T16:19:34.2823392Z Post job cleanup. 2025-12-04T16:19:34.3953872Z [command]/usr/bin/git version 2025-12-04T16:19:34.4016742Z git version 2.50.1 2025-12-04T16:19:34.4055669Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/9fc44b29-008f-4bd6-acfc-6f30b31731c8/.gitconfig' 2025-12-04T16:19:34.4065179Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/9fc44b29-008f-4bd6-acfc-6f30b31731c8' before making global git config changes 2025-12-04T16:19:34.4066416Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T16:19:34.4070800Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T16:19:34.4113438Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T16:19:34.4154680Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T16:19:34.4497899Z Entering 'android/libs/fbjni' 2025-12-04T16:19:34.4561943Z Entering 'third_party/FP16' 2025-12-04T16:19:34.4624861Z Entering 'third_party/FXdiv' 2025-12-04T16:19:34.4690950Z Entering 'third_party/NNPACK' 2025-12-04T16:19:34.4754083Z Entering 'third_party/NVTX' 2025-12-04T16:19:34.4819978Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T16:19:34.4883175Z Entering 'third_party/XNNPACK' 2025-12-04T16:19:34.4965716Z Entering 'third_party/aiter' 2025-12-04T16:19:34.5029369Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T16:19:34.5101776Z Entering 'third_party/benchmark' 2025-12-04T16:19:34.5164752Z Entering 'third_party/composable_kernel' 2025-12-04T16:19:34.5241075Z Entering 'third_party/cpp-httplib' 2025-12-04T16:19:34.5304489Z Entering 'third_party/cpuinfo' 2025-12-04T16:19:34.5367206Z Entering 'third_party/cudnn_frontend' 2025-12-04T16:19:34.5430714Z Entering 'third_party/cutlass' 2025-12-04T16:19:34.5505795Z Entering 'third_party/fbgemm' 2025-12-04T16:19:34.5570170Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T16:19:34.5632437Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T16:19:34.5702926Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T16:19:34.5765200Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T16:19:34.5838244Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T16:19:34.5901665Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T16:19:34.5962739Z Entering 'third_party/fbgemm/external/json' 2025-12-04T16:19:34.6027722Z Entering 'third_party/flash-attention' 2025-12-04T16:19:34.6091978Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T16:19:34.6160319Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T16:19:34.6233393Z Entering 'third_party/flatbuffers' 2025-12-04T16:19:34.6299488Z Entering 'third_party/fmt' 2025-12-04T16:19:34.6363441Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T16:19:34.6428365Z Entering 'third_party/gloo' 2025-12-04T16:19:34.6492023Z Entering 'third_party/googletest' 2025-12-04T16:19:34.6555633Z Entering 'third_party/ideep' 2025-12-04T16:19:34.6617088Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T16:19:34.6687290Z Entering 'third_party/ittapi' 2025-12-04T16:19:34.6750059Z Entering 'third_party/kineto' 2025-12-04T16:19:34.6814718Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T16:19:34.6876258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T16:19:34.6940213Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T16:19:34.7001519Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T16:19:34.7063521Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T16:19:34.7126249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T16:19:34.7191400Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T16:19:34.7254918Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T16:19:34.7318054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T16:19:34.7381188Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T16:19:34.7445401Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T16:19:34.7507899Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:34.7571313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:34.7638514Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T16:19:34.7700564Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T16:19:34.7765947Z Entering 'third_party/kleidiai' 2025-12-04T16:19:34.7831379Z Entering 'third_party/mimalloc' 2025-12-04T16:19:34.7894741Z Entering 'third_party/nlohmann' 2025-12-04T16:19:34.7963067Z Entering 'third_party/onnx' 2025-12-04T16:19:34.8046708Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T16:19:34.8111064Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T16:19:34.8180457Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T16:19:34.8244391Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T16:19:34.8307804Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T16:19:34.8370261Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T16:19:34.8434468Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T16:19:34.8496036Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T16:19:34.8557206Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T16:19:34.8618587Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:34.8682417Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:34.8745905Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T16:19:34.8833110Z Entering 'third_party/pocketfft' 2025-12-04T16:19:34.8896167Z Entering 'third_party/protobuf' 2025-12-04T16:19:34.8962246Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T16:19:34.9023301Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T16:19:34.9086683Z Entering 'third_party/psimd' 2025-12-04T16:19:34.9149739Z Entering 'third_party/pthreadpool' 2025-12-04T16:19:34.9211742Z Entering 'third_party/pybind11' 2025-12-04T16:19:34.9275296Z Entering 'third_party/python-peachpy' 2025-12-04T16:19:34.9337588Z Entering 'third_party/sleef' 2025-12-04T16:19:34.9401725Z Entering 'third_party/tensorpipe' 2025-12-04T16:19:34.9464636Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T16:19:34.9527263Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T16:19:34.9589974Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T16:19:34.9670423Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T16:19:34.9730749Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T16:19:34.9818291Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T16:19:34.9842729Z http.https://github.com/.extraheader 2025-12-04T16:19:34.9854432Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T16:19:34.9890826Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T16:19:35.0226313Z Entering 'android/libs/fbjni' 2025-12-04T16:19:35.0269804Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0308596Z Entering 'third_party/FP16' 2025-12-04T16:19:35.0351134Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0389014Z Entering 'third_party/FXdiv' 2025-12-04T16:19:35.0432136Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0470060Z Entering 'third_party/NNPACK' 2025-12-04T16:19:35.0515695Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0555654Z Entering 'third_party/NVTX' 2025-12-04T16:19:35.0598666Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0638748Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T16:19:35.0681554Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0719010Z Entering 'third_party/XNNPACK' 2025-12-04T16:19:35.0762249Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0817527Z Entering 'third_party/aiter' 2025-12-04T16:19:35.0860448Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0902970Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T16:19:35.0944964Z http.https://github.com/.extraheader 2025-12-04T16:19:35.0994031Z Entering 'third_party/benchmark' 2025-12-04T16:19:35.1036751Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1074544Z Entering 'third_party/composable_kernel' 2025-12-04T16:19:35.1117446Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1166569Z Entering 'third_party/cpp-httplib' 2025-12-04T16:19:35.1210028Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1248342Z Entering 'third_party/cpuinfo' 2025-12-04T16:19:35.1291685Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1333529Z Entering 'third_party/cudnn_frontend' 2025-12-04T16:19:35.1376906Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1415745Z Entering 'third_party/cutlass' 2025-12-04T16:19:35.1457795Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1507458Z Entering 'third_party/fbgemm' 2025-12-04T16:19:35.1550891Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1592093Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T16:19:35.1633964Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1671314Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T16:19:35.1713897Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1760280Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T16:19:35.1805717Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1843714Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T16:19:35.1885334Z http.https://github.com/.extraheader 2025-12-04T16:19:35.1933890Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T16:19:35.1976128Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2013859Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T16:19:35.2055204Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2098493Z Entering 'third_party/fbgemm/external/json' 2025-12-04T16:19:35.2141215Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2181210Z Entering 'third_party/flash-attention' 2025-12-04T16:19:35.2224214Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2262606Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T16:19:35.2304987Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2349728Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T16:19:35.2391924Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2440157Z Entering 'third_party/flatbuffers' 2025-12-04T16:19:35.2482351Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2523726Z Entering 'third_party/fmt' 2025-12-04T16:19:35.2566263Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2605635Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T16:19:35.2648174Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2687639Z Entering 'third_party/gloo' 2025-12-04T16:19:35.2729798Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2767856Z Entering 'third_party/googletest' 2025-12-04T16:19:35.2811087Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2849851Z Entering 'third_party/ideep' 2025-12-04T16:19:35.2891777Z http.https://github.com/.extraheader 2025-12-04T16:19:35.2928658Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T16:19:35.2969073Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3016064Z Entering 'third_party/ittapi' 2025-12-04T16:19:35.3058995Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3096645Z Entering 'third_party/kineto' 2025-12-04T16:19:35.3139079Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3177298Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T16:19:35.3219420Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3257134Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T16:19:35.3299972Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3340357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T16:19:35.3383639Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3422799Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T16:19:35.3464024Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3502350Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T16:19:35.3543420Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3579792Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T16:19:35.3622603Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3662002Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T16:19:35.3704633Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3741903Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T16:19:35.3785073Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3823840Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T16:19:35.3866069Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3905574Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T16:19:35.3946880Z http.https://github.com/.extraheader 2025-12-04T16:19:35.3984248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T16:19:35.4028347Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4064824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:35.4110405Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4151070Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:35.4193851Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4236405Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T16:19:35.4278749Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4316190Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T16:19:35.4357473Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4397279Z Entering 'third_party/kleidiai' 2025-12-04T16:19:35.4440470Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4478825Z Entering 'third_party/mimalloc' 2025-12-04T16:19:35.4522339Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4559670Z Entering 'third_party/nlohmann' 2025-12-04T16:19:35.4603102Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4642265Z Entering 'third_party/onnx' 2025-12-04T16:19:35.4684601Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4741927Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T16:19:35.4784378Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4825854Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T16:19:35.4868200Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4908600Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T16:19:35.4949084Z http.https://github.com/.extraheader 2025-12-04T16:19:35.4985872Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T16:19:35.5027608Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5066647Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T16:19:35.5108847Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5145924Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T16:19:35.5188062Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5227980Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T16:19:35.5269336Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5307061Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T16:19:35.5348185Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5385134Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T16:19:35.5426618Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5462086Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:35.5506414Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5544947Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:35.5586607Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5626904Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T16:19:35.5668822Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5729554Z Entering 'third_party/pocketfft' 2025-12-04T16:19:35.5773987Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5812202Z Entering 'third_party/protobuf' 2025-12-04T16:19:35.5855497Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5896106Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T16:19:35.5938326Z http.https://github.com/.extraheader 2025-12-04T16:19:35.5974980Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T16:19:35.6017629Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6056856Z Entering 'third_party/psimd' 2025-12-04T16:19:35.6099759Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6138281Z Entering 'third_party/pthreadpool' 2025-12-04T16:19:35.6180697Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6218159Z Entering 'third_party/pybind11' 2025-12-04T16:19:35.6261012Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6298833Z Entering 'third_party/python-peachpy' 2025-12-04T16:19:35.6341580Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6379730Z Entering 'third_party/sleef' 2025-12-04T16:19:35.6422220Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6459823Z Entering 'third_party/tensorpipe' 2025-12-04T16:19:35.6503409Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6540719Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T16:19:35.6582058Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6619693Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T16:19:35.6660681Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6697634Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T16:19:35.6738668Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6775529Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T16:19:35.6818794Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6854906Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T16:19:35.6896867Z http.https://github.com/.extraheader 2025-12-04T16:19:35.6957322Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:35.6991224Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T16:19:35.7335179Z Entering 'android/libs/fbjni' 2025-12-04T16:19:35.7363662Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T16:19:35.7382318Z Entering 'third_party/FP16' 2025-12-04T16:19:35.7411791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T16:19:35.7429981Z Entering 'third_party/FXdiv' 2025-12-04T16:19:35.7458375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T16:19:35.7476804Z Entering 'third_party/NNPACK' 2025-12-04T16:19:35.7506238Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T16:19:35.7524604Z Entering 'third_party/NVTX' 2025-12-04T16:19:35.7553856Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T16:19:35.7572927Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T16:19:35.7600828Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T16:19:35.7619450Z Entering 'third_party/XNNPACK' 2025-12-04T16:19:35.7648270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T16:19:35.7684507Z Entering 'third_party/aiter' 2025-12-04T16:19:35.7713595Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T16:19:35.7733168Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T16:19:35.7760925Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T16:19:35.7787661Z Entering 'third_party/benchmark' 2025-12-04T16:19:35.7816714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T16:19:35.7834913Z Entering 'third_party/composable_kernel' 2025-12-04T16:19:35.7863989Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T16:19:35.7890735Z Entering 'third_party/cpp-httplib' 2025-12-04T16:19:35.7919167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T16:19:35.7937338Z Entering 'third_party/cpuinfo' 2025-12-04T16:19:35.7966012Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T16:19:35.7985594Z Entering 'third_party/cudnn_frontend' 2025-12-04T16:19:35.8014386Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T16:19:35.8033429Z Entering 'third_party/cutlass' 2025-12-04T16:19:35.8062170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T16:19:35.8089635Z Entering 'third_party/fbgemm' 2025-12-04T16:19:35.8118136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T16:19:35.8138893Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T16:19:35.8166766Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T16:19:35.8184238Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T16:19:35.8211947Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T16:19:35.8239068Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T16:19:35.8267187Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T16:19:35.8285854Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T16:19:35.8313432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T16:19:35.8341944Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T16:19:35.8369257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T16:19:35.8386654Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T16:19:35.8415108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T16:19:35.8432101Z Entering 'third_party/fbgemm/external/json' 2025-12-04T16:19:35.8459370Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T16:19:35.8480121Z Entering 'third_party/flash-attention' 2025-12-04T16:19:35.8509110Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T16:19:35.8527785Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T16:19:35.8556178Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T16:19:35.8580463Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T16:19:35.8608270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T16:19:35.8636116Z Entering 'third_party/flatbuffers' 2025-12-04T16:19:35.8665421Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T16:19:35.8686831Z Entering 'third_party/fmt' 2025-12-04T16:19:35.8715038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T16:19:35.8733341Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T16:19:35.8761791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T16:19:35.8780741Z Entering 'third_party/gloo' 2025-12-04T16:19:35.8808983Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T16:19:35.8827956Z Entering 'third_party/googletest' 2025-12-04T16:19:35.8857079Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T16:19:35.8875725Z Entering 'third_party/ideep' 2025-12-04T16:19:35.8903967Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T16:19:35.8920709Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T16:19:35.8948285Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T16:19:35.8974137Z Entering 'third_party/ittapi' 2025-12-04T16:19:35.9005119Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T16:19:35.9023105Z Entering 'third_party/kineto' 2025-12-04T16:19:35.9052297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T16:19:35.9070366Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T16:19:35.9098253Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T16:19:35.9116442Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T16:19:35.9144229Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T16:19:35.9163710Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T16:19:35.9192505Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T16:19:35.9211437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T16:19:35.9239540Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T16:19:35.9258540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T16:19:35.9286645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T16:19:35.9303508Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T16:19:35.9331578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T16:19:35.9350848Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T16:19:35.9378674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T16:19:35.9396380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T16:19:35.9433071Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T16:19:35.9450816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T16:19:35.9479550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T16:19:35.9498602Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T16:19:35.9528044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T16:19:35.9545477Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T16:19:35.9573528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T16:19:35.9591489Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:35.9620212Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T16:19:35.9641020Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:35.9670567Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T16:19:35.9693213Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T16:19:35.9721763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T16:19:35.9740078Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T16:19:35.9769015Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T16:19:35.9789842Z Entering 'third_party/kleidiai' 2025-12-04T16:19:35.9820524Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T16:19:35.9839859Z Entering 'third_party/mimalloc' 2025-12-04T16:19:35.9868544Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T16:19:35.9887562Z Entering 'third_party/nlohmann' 2025-12-04T16:19:35.9916983Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T16:19:35.9936893Z Entering 'third_party/onnx' 2025-12-04T16:19:35.9966639Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T16:19:36.0005868Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T16:19:36.0034462Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T16:19:36.0055463Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T16:19:36.0085174Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T16:19:36.0106434Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T16:19:36.0134202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T16:19:36.0151452Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T16:19:36.0180498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T16:19:36.0198099Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T16:19:36.0226853Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T16:19:36.0244161Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T16:19:36.0272701Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T16:19:36.0291600Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T16:19:36.0319514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T16:19:36.0336738Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T16:19:36.0364564Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T16:19:36.0381887Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T16:19:36.0411176Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T16:19:36.0427382Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T16:19:36.0455984Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T16:19:36.0475493Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T16:19:36.0504874Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T16:19:36.0524069Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T16:19:36.0551669Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T16:19:36.0592845Z Entering 'third_party/pocketfft' 2025-12-04T16:19:36.0622142Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T16:19:36.0639641Z Entering 'third_party/protobuf' 2025-12-04T16:19:36.0668044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T16:19:36.0691389Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T16:19:36.0719012Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T16:19:36.0737059Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T16:19:36.0765528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T16:19:36.0785660Z Entering 'third_party/psimd' 2025-12-04T16:19:36.0814839Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T16:19:36.0833140Z Entering 'third_party/pthreadpool' 2025-12-04T16:19:36.0861793Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T16:19:36.0880385Z Entering 'third_party/pybind11' 2025-12-04T16:19:36.0909381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T16:19:36.0927907Z Entering 'third_party/python-peachpy' 2025-12-04T16:19:36.0957348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T16:19:36.0975695Z Entering 'third_party/sleef' 2025-12-04T16:19:36.1004735Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T16:19:36.1023258Z Entering 'third_party/tensorpipe' 2025-12-04T16:19:36.1051796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T16:19:36.1069566Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T16:19:36.1097162Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T16:19:36.1115599Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T16:19:36.1143241Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T16:19:36.1160756Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T16:19:36.1188056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T16:19:36.1208474Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T16:19:36.1238286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T16:19:36.1255061Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T16:19:36.1283683Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T16:19:36.1325992Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1358078Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1386395Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1416326Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1445556Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1473753Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1504317Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1531613Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1562198Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1589956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1619043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1647867Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1676268Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1704454Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1733973Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1761517Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1789727Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1818349Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1846632Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1874917Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1904323Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1932630Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1961031Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.1988179Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2016102Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2043616Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2070826Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2098606Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2128089Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2155140Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2182769Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2210414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2238670Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2266767Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2296347Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2325808Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2354151Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2382983Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2412954Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2442010Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2470674Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2498421Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2528342Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2556809Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2586285Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2635453Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2650338Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2679013Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2709130Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2737208Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2764979Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2792430Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2820590Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2847725Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2875497Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2903605Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2931360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2959616Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.2986911Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3015907Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3043984Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3071575Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3099810Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3128583Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3157236Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3183854Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3211136Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3245177Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3273503Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3300597Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3328476Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3356344Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3384599Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3414424Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3443045Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3471001Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3498540Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3527013Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3556978Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3585725Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3614070Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T16:19:36.3726048Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T16:19:36.3742578Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T16:19:36.3748694Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T16:19:36.3749148Z ##[endgroup] 2025-12-04T16:19:36.3847728Z [!ALERT!] Swap in detected! [!ALERT!] 2025-12-04T16:19:48.4526127Z [!ALERT!] Swap out detected [!ALERT!] 2025-12-04T16:20:08.6707070Z Cleaning up orphan processes